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In This Issue 


This issue of Survey Methodology contains a special section on the Statistical Uses of 
Administrative Data. The five papers in the section cover a diversity of topics ranging from policy 
issues to data processing. 

With the increasing emphasis on the use of administrative records by statistical agencies, pro- 
babilistic matching or record linkage methods are becoming more widespread. Most applied work 
is done using the framework described by Fellegi and Sunter (1969). Winkler examines the impor- 
tance of an independence assumption that is usually employed in applications involving the 
Fellegi-Sunter model because it leads to great computational simplication. In the context of a 
problem involving matching lists of businesses he investigates modifications that can be used 
when the independence assumption is not valid. 

The paper by Redfern deals with a statistical use of administrative records that is of great 
importance for statistical agencies — the use of administrative records as a source of census data. 
He notes that Denmark has completely abandoned the traditional questionnaire-based census 
in favour of the use of administrative records to obtain census data. In three other European 
countries, some data that were traditionally collected using a census questionnaire are now 
obtained directly from administrative sources. The author considers the situation in the United 
Kingdom in detail. He concludes that public concerns about invasion of privacy, as well as 
political ideology and scarce resources, are blocking the consolidation of administrative 
information from a number of diverse sources into a central population register. He suggests 
that, although political considerations will always carry the greatest weight in any discussion 
of the development of a population register, statisticians have an obligation to make their 
views known. 

Jonas and Hanczaryk note that the role of administrative data at the U.S. Bureau of 
the Census has increased over time. The need for an overall quality management system 
that is responsive to problems related to the processing of very large amounts of data 
was recognized before the 1987 Economic Censuses. The system that was developed involves 
the extensive use of microcomputers to reduce costs. 

Moore and Marquis describe an application involving the use of administrative data in survey 
evaluation. Information from the Survey of Income and Program Participation conducted by 
the U.S. Bureau of the Census was matched to administrative records for five federal programs 
and four state programs using record linkage methods. Analysis of the data set is just begin- 
ning. The objectives of the study are to quantify the effects of measurement errors and to use 
this information to derive more efficient survey designs. 

Statistics Canada is in the process of reorganizing its programme of economic surveys. A key 
element is the rebuilding of its central register of economic entities, which will serve as the frame 
for economic surveys. Clark and Lussier’s paper outlines the concepts and procedures underlying 
the establishment and maintenance of profiles of economic entities and describes the role of 
administrative data in this task. A number of issues with respect to profiling activities are raised 
following a simulation study. 

In this issue’s first paper, Kott develops a small domain estimator which meets the criterion 
of design consistency introduced by Isaki and Fuller (1982). The mean squared error of this 
estimator is evaluated. Using an empirical example, Kott shows that the mse estimator 
can be used to choose between the proposed small domain estimator and the conventional 
design-based estimator. 


2 In This Issue 


Published estimates for periodic surveys are often based only on the current sample, thereby 
failing to exploit correlations with estimates from previous periods. On the other hand, economists 
and other social scientists frequently ignore the sampling error when using these estimates in 
their time series models. Binder and Dick show how sampling error can be taken into account 
in these models. For readers new to the area, the authors provide a brief review of previous work, 
with an extensive list of references. 

Battese, Hasabelnaby and Fuller investigate a procedure for constructing a composite estimator 
for livestock numbers. The authors use a linear model to pool six types of estimators from the 
U.S. Department of Agriculture June Enumerative Survey over several years. Empirical results 
show the improvements in variance for the optimal linear combination of the six estimators within 
a particular year, with further improvements if the other years’ estimators are included. 

Bethel examines optimal allocation for multipurpose surveys. A study of the sensitivity of the 
optimal allocation to changes in variance constraints is presented. Bethel derives results which can 
be used to determine if survey costs can be reduced significantly by allowing some variances to 
increase marginally. He also presents an iterative algorithm for solving the optimization problem. 

Bruning and Hu provide insight into the issue of survey recall versus diary collection methods. 
They start with a literature review of studies and comparisons of the two methods. The main 
part of the paper deals with an experiment to assess the relationship between several demographic 
factors and the collection methods. The findings confirm those of earlier studies but also strongly 
raise the possibility of measurement problems with the survey recall collection method. 

Quality assurance sampling is applied by Lemeshow and Stroh to the problem of reducing the 
sample size needed to ascertain whether a population meets certain health standards. The example 
used by the authors is the immunization coverage of children in developing countries. The 
sampling method uses an initial sample to test the hypothesis of adequate vaccination by stratum. 
Strata where the test result is not sufficiently conclusive are subjected to additional sampling. 
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Robust Small Domain Estimation Using 
Random Effects Modeling 


PHILLIP S. KOTT! 


ABSTRACT 


This paper develops a design consistent small domain estimator using a random effects model. The mean 
squared error of this estimator is then evaluated without assuming the random effect component of the 
model is correct. Data from a complex sample survey shows how this approach to mean squared error esti- 
mation, while perhaps too instable to be used directly, can be employed to determine whether the design 
consistent small domain estimator proposed here is better than the conventional design-based estimator. 


KEY WORDS: Finite population; Model; Mean squared error; Design consistent; Randomization. 


1. INTRODUCTION 


Suppose we were given a probability sample of unit values and were asked to estimate the 
mean of a small domain within the larger population covered by the sample. Scott and Smith 
(1969) introduced a Bayesian estimator for this purpose and showed that their estimator could 
also be developed using only unbiasedness and minimum variance (UMV) criteria. Their UMV 
approach, sometimes called random effects or components-of-variance modeling, will be 
adopted here. 

Most attempts at small domain estimation paralleling Scott and Smith (e.g., Fay and Herriot 
1979, Battese and Fuller 1971, Ghosh and Meeden 1986, Prasad and Rao 1986, Fuller and 
Harter 1987, and Stroud 1987) assume that the sampling design is noninformative and so 
ignorable. The same assumption is made for synthetic estimators of small domain means, which 
will not be discussed at any depth here (for examples of these, see Gonzalez and Hora 1978). 

Assuming a noninformative sampling design misses perhaps the most important contribu- 
tion of randomization to inference. Since most statistical models in finite population inference 
are either wrong or (at best) incomplete, it is desirable for an estimation strategy to have the 
following property: if the sample were large enough, the estimator should approach what it 
is estimating almost certainly no matter what the ‘true’? model. This desire receives formal 
expression in the criterion of design consistency introduced by Isaki and Fuller (1982). 

Design consistency is an asymptotic property. As a result, it is often necessary to hypothesize 
a model (or models) when choosing among alternative design consistent estimation strategies. 
This is especially true in the case of small domain estimation, where the sample may be par- 
ticularly small and the sampling design beyond one’s control. Nevertheless, limiting attention 
to design consistent estimators does offer some, albeit small, protection against model failure. 
Using this reasoning, Sarndal (1984) focused his attention on design consistent small domain 
estimators. We will follow that practice here. 

Section 2 develops a design consistent random effects estimator for a small domain popula- 
tion mean. Section 3 introduces a robust (but unstable) estimator for the model and design 
mean squared errors of the small domain estimator. It is robust in the sense of not depending 
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on the necessary, but heroic, model that links the small domains together. Section 4 contains 
an empirical example and Section 5 a discussion. 


2. THE ESTIMATOR 
We begin with the basic (or fixed effects) model: 


Vegi = 0, aM Ei» (1) 


where the €,; are uncorrelated random variables with means of zero, and var(€,;) = 5; The 
subscript gi denotes a unit in domain g. There are N, units in the population from domain g 
and m domains. 
Let us focus on a particular domain /. The problem is to estimate the domain mean: 
Ny 
vip ye Y;i/Nj- 


i=] 


Let p;; be the probability of selecting unit ji for the sample and n; be the number of units 
selected from domain /. It is well known that a design unbiased and model efficent linear es- 
timation strategy for ¥;p would set the p,; equal to n;/Nj and the estimator equal to } we Vil Nj, 
where the units are relabeled so that jl, ..., jn; are in the sample. 

Unfortunately, one is often required in practice to estimate a domain mean using a sample 
that has not been selected primarily for that purpose. Consequently, the selection probabilities 
within domain j may not all equal n;/N;. A popular estimator in this circumstance is 


s 
ij 
dy = VI Wi Nii (2) 
i=1 
where 
Hi 
Wii = Dji'/ es Dijk’ 
k=1 


denotes the sampling weight of unit ji. This estimator was suggested by Brewer (1963) and Hajek 
(1971). 

The estimator d; is clearly model unbiased under (1), in the sense that E,(d; — Jjp) = 0. 
Under many sampling designs, d; is also design consistent; i.e., 


plim,(d; — Yip) = 0, 


Nj — 00 


where z denote the probability space generated by the random selection process rather than 
the model in (1). 

Isaki and Fuller (1982) give sufficient conditions for d; to be design consistent, and it is 
under most sampling designs in common practice. Notable exceptions involve systematic 
sampling from a predetermined list (see Kott 1986). A popular alternative to design consistency 
is Brewer’s (1979) asymptotic design unbiasedness (ADU) property. The estimator d; is always 
ADU. 
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The trouble with d; is that it may not be very efficient for small n;. One solution is to ‘‘draw 
strength’ from the other domains by treating the fixed parameter 6; as if it was a realization 
of a random variable satisfying this /inking model: 


OF ets (3) 


where E(7;) = 0, and E(7;7,) = o” when j = g and 0 otherwise. This is sometimes called 
‘‘random effects modeling,’’ because the heretofore fixed effect of being a unit in domain /, 
6;, is now being treated as a random variable. 

Combining equations (1) and (3) results in the reduced form components-of-variance model: 


J ji = E ar Tj =F Cji- (4) 


Many analysts start with equation (4). We have separated the basic and linking models to 

underscore the greater level of confidence one often has in the validity of the basic model 

(especially when it is assumed as part of the linking model that all 5: = 67, as it soon will be). 
Any estimator of the form: 


fj(a,c) = (l—a)dj + aj, 


where 
c= (Cis ss+59 Cj» 0, Cj+1> Ree) 3 
m 
b= Vp eeSes» 
g=l 
mg 
Yes a BD VzgilNg» 
va 
and 


is unbiased under the model in (4). (Note: although the variables c and fj depend on domain 
j, additional denotation has been suppressed for simplicity.) 

If all the 53 are assumed equal to 5’, then using a Lagrangian multiplier technique it is not 
difficult to show that the choices for a and the c, that minimize the model variance of 


fi(o, ¢) — Jip are - 
y wii — 1/N; 
* — ES eee IN eae Fe TEE Se a > (5) 
Dea phi arte aod APCs yros/ 0%) 
I g & 


a 
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and 


2752 —-1l)-1 
for), en 
ce = gl Act sal a gid WO f for g#/. (6) 


Da hae) ot than er 
h 


In practice, o” and 6? are rarely known. Ghosh and Meeden (1986) have proposed estima- 
ting the ratio 07/6” from the sample in a model consistent manner (as m— 0) by 


yy % Ves —Is)7?/(m — 1) 

L = max § 0, | =A - 1] (m — 1) /(n - Ste Mpa) 
9 ye (Vegi — Dgs)*/(n — m) g 
g i 


where 

Is = 3 NgVgs/N 
and 

n= )) n,. 


Let a’(L) and c’(L) be the right hand sides of equations (5) and (6) respectively with L 
replacing 07/67. Now call 


ej = fila’(L), e’(L)] 


the random effects estimator, where ji in C= f)(w-)is set equal top’ CL) = Ce(L) Pgs. AS 
m grows large, e; become indistinguishable from as tc*): 

If the model in (4) is correct and all the 67 = 5? > 0, then for sufficiently large m, L must 
be positive. Even if the model fails, as long as L is bounded from below by a positive number, 
| u’?(L) | is bounded, and n res Wii is bounded as n; (but not m) grows arbitrarily large, 
then e; is design consistent whenever d; is. This is because 


plim, [a’(L)] = 0, 
LF aal o 


so that e; converges to the design consistent dj. 


3. MODEL AND DESIGN MEAN SQUARED ERROR 


Under some sampling designs there exists an estimator of the design variance of d; that is 
also a model unbiased estimator of the variance of dj; as an estimator for ¥;p under the basic 
model (henceforth I will omit the clarifying phrase ‘‘as an estimator for Jip’ to simplify the 
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exposition). Often, however, one must settle for a design consistent estimator of the design 
mean squared error of d; (assuming, as we will, one exists). This is particularly true when 
Y ve Pix’ # Nj. Kott (1987) shows how (when necessary) this estimator of the design mean 
squared error of d; can be adjusted to be simultaneously a design consistent estimator of the 
design mean squared error of d; and a model unbiased estimator of the variance of d; under 
the basic model. Call this adjusted ‘‘variance estimator’’ v(d;). 

Weare now ready to address the model and design mean squared errors of the random effects 
estimator, e;. Although we needed to assume that the 6 7 were all equal to determine e;, we need 
not make that assumption in assessing the accuracy of e;. In fact, we need not even assume 
that the linking model in equation (3) holds! Instead, we assume only that m is large enough 
so that L may be viewed as (virtually) independent of the units in domain /. Alternatively, L 
can be redefined by excluding units from domain / in the summations on the right hand side 
of (7). 

Either way, E,[(d; — Jip) Vip — w’(L)))] = 0. Asa result, 


Belt ap. — re 8, heal Vale(@— Vip) ot bel (ype He ale 
It is now a simple matter to show that under the basic model in (1), 


v(e)) = [1 — 2a’°(L)] v(d;) + [a’(L)]* [d; — w’(L)]? 


is an unbiased estimator of the model mean squared error of e; given L and p’(L). Since 
a’(L) is asymptotically zero as n; approaches infinity, v(e;) is also a design consistent 
estimator of the design mean squared error of e; whenever v(d;) is a design consistent 
estimator of the design mean squared error of dj. 

It is not necessary for L to converge to 07/67 or »’(L) to converge to p for v(e@;) to have 
the properties described above. In fact, it is not necessary for the limits of L and »’(L) to have 
any interpretations at all, since these properties have been defined independently of the model 
in equation (3). 

Statisticians often have much more confidence in the basic model in equation (1) than the 
linking model in equation (3), especially when the latter is coupled with the assumption of con- 
stant unit variances (6,) across domains. It is therefore reassuring that the accuracy of the 
e; can be estimated without invoking (3) or requiring that the 6, be equal. 

Unfortunately, v(e;) is unstable and can even be negative when a’(L) exceeds 0.5. Never- 
theless, a simple comparison of the relative sizes of v(d;) and v(e;) over the m domains 
(j = 1, ..., m) provides a robust method for choosing between the two estimators, dj and é;. 


4. AN EMPIRICAL EXAMPLE 


The Human Nutrition Information Service (HNIS) conducted a stratified, multistage survey 
of one day food intake by women aged 19-50 in 1985 as part of its Continuing Survey of Food 
Intakes by Individuals (CSFII). Responses were converted into measured intakes from among 
60 food groups and 27 nutrients. See Human Nutrition Information Service (1985) for more 
details on the survey and its sample design. 
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We will restrict our attention here to the estimation of mean intake of milk and milk pro- 
ducts (one of the 60 food groups) by women 19-34 and 35-50 within 12 mutually exclusive 
domains. These domains are defined by two cross classifications: region (northeast, midwest, 
south, and west) and level of urbanization (central city, suburban, non-metropolitan). HNIS 
published mean food group intakes separately for these two age groups on the national level 
only. Mean nutrition intakes were published for each age group by region and level of urbaniza- 
tion but were not cross-classified. 

The CSFII sample design employed an independent stratified multistage sample with each 
of these domains. First primary sampling units (cities or town) were chosen using probability 
proportional to size sampling with replacement, then a random subsample of area segments 
was selected from which a smaller random subsample of households were chosen. I added 
another level of subsampling. When more than one woman per household from an age group 
was in the CSFII sample, I randomly chose one. 

For each group, d; in equation (2) defines the conventional design-based estimated of the 
domain mean. The SESUDAAN program (Shah 1980) provided design consistent estimators of 
all the d; and their design root mean squared errors (VMSE(d;)). These estimators, when 
eared? are not necessarily model unbiased estimators of the model variance of d; under 
equation (1) however. 

To see this, we confine our attention not only to an age group but to a domain as well and 
suppress the subcript j. Let h = 1, ..., H denote strata, k = 1, ..., K, denote primary 
sampling units (PSU’s) inh, andi = 1, ..., m,, denote sampled women in hk. The estimate 
for the mean intake estimate is 


Kp, Nhk 


H 
e py ps Whki Yhki- 


h=1 i=] 


We need more notation before we proceed. Let 


"hk 

Xhk = Whki»s 
a 
"hk i 

Zhk = Whki» 
= 
"hk 

Shk = Writ (Vag — 2) 
ay 

and 

Kp 

Sh = ye Sink / Kp 
k=1 


If we assume the population size of the domain is large enough to be ignored (this also vir- 
tually assures that no individual had been sampled twice), the model variance of d is 
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var,(d) = 5 >s BD yi; Whki 
ie al 
5 ») 2 Zhk- 
h ok 


The SESUDAAN (linearization) estimator for the design mean squared error of d is 


H Kp 
v*(d) = 3 C/K Ab) )3 fri — Sn)? - 
h=1 


k=1 


After much manipulation the model expectation of this can be shown to be 


E,[v*(d)] = =| Eo aran 
h k 


Kp Kp Kp 
2 1s (K;,/[K, — W)/( > Znk Xhk — de Zhk ys s/s) 
h k k k 


+ 


(2 En) » (KTR my she ~{ Yh an} 745) |. 


k 
Following Kott (1987), 
v(d) = v*(@) var. (d)/E-[v*(d)] 


is both a design consistent estimator for the mean squared error of d (under certain conditions) 
and a model unbiased estimator of the model variance of d. 

Calculations for n;, dj, a’(L), e;, v(d;) and v(e;) for the 12 domains in each of the two 
groups are displayed in Table 1 (the domain subscript j has been returned to d; and e;). Using 
equation (5), Z was calculated to be 0.055 for women 19-34 and 0.037 for women 35-50. This 
suggests that women in the same domain had little in common over and above their member- 
ship in the same age group. Nevertheless, w’(L) exceeded 0.5 only for five (out of 24) cells 
all with samples of under 25 women. 

The estimate v (e;) was negative twice and less than v(d;) 18 out of 24 times, nine times for 
each age group. These latter group of numbers suggest to me that the e; are indeed better 
estimates than the d;. Formally, if we treat each of the 24 differences, v(e;) — v(d;), as if they 
were independent across domains (they aren’t quite), the hypothesis that the true model (or 
design) mean squared errors of e; and d; are equal and the random variable v(e;) — v(d;) as 
likely positive as negative is soundly rejected. 

The reduction in mean squared error from using e; in place of d; is estimated (by 
X {v(e;) — v(dj)}/Y v(d;)) to be 40.6%. This translates into a standard error reduction of 
22.9%. Note that because we are summing 24 near independent random variates, we have much 
more confidence in this estimate than any particular v(e;) (or v(d;) for that matter). 
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Table 1 
Estimated Values for the Domains by Age Group 


—_—_—_—_—_———— 


Women 19-34 

Domain mo — —_—i—MNGB III 

Sample Size dj ej v(d;) v(e;) a’(L) 
oe ee ee ee Sk er ieee Pr ae 
N-C 68 220.6 222.4 683.0 367.5 233 
N-S 95 198 7 203.1 568.8 367.8 need 
N-R LZ 219.1 223.8 5266.7 -1349.5 630 
M-C 55 270.7 258.6 2021.5 1525 on 
M-S 107 2112 267.8 625.8 509.6 164 
M-R 73 301.1 285.9 4027.1 2754.3 .187 
So-C 66 212.4 DA Ie 3011.6 1700.1 .220 
So-S 112 156.8 167.9 472.8 457.3 .146 
So-R 81 117.0 159.3 592.0 868.9 184 
Ww-C oo 403.0 33332 2064.2 5438.4 364 
W-S 74 205.0 209.6 1704.0 1018.3 207 
W-R 13 120.0 190.7 453355 3924.3 652 

ES ee ee ea. ae, 

Women 35-50 

SE ee ee ee ee A oe a 
N-C 44 2053 197.4 1716.1 318.4 425 
N-S 67 135.0 1532 1068.8 698.0 326 
N-R 21 206.1 195.4 31 Den 56.6 .550 
M-C 28 89.0 13925 470.3 2559.9 482 
M-S 87 200.3 196.1 2128:5 1049.2 258 
M-R 38 304.9 250.7 6065.3 39739 415 
So-C 47 136.1 159.6 266.7 592.6 421 
So-S 93 161.0 167-7 1492.5 809.1 244 
So-R dys 128.8 146.3 1023.4 790.9 .263 
Ww-C 23 205.5 193.9 7497.1 -1067.6 580 
W-S 88 245.1 229.1 2484.7 1432.2 .263 
W-R EI 152.1 173.3 743.3 1344.1 .734 


Domain Codes 
N - Northeast; M - Midwest; So - South; W - West; C - Central City; S - Suburban; 
R - Non-metropolitan. 


5. DISCUSSION 


Let n# = 1/Y id ; Wi, define the effective sample size within domain J. Observe that 
n;* =< nj, where equality holds if and only if all the sampling weights within / are all equal to 
1/n;. For a known 07/5”, the only difference between the optimal estimator developed here, 
Jj(a*, c*), and the best linear unbiased predictor in Scott and Smith (1969) is that 1/n;* has 
replaced 1/n; in the formula for a* (equation (5)). The effect of this when the w,; within / are 
not all equal is to increase a*; that is, to increase the dependence on sample information from 
outside domain /. This happens because forcing the estimator to be design consistent results 
in the domain / sample not being used as efficiently as possible. We could penalize the sample 
from outside the domain in a conformal manner by using sample weights in determining p.’(L), 
but that would only decrease the model efficiency of the estimator without improving any 
design-based characteristic. 
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Equation (7) assures that L can be no less than zero. This means that a’(L) can be no greater 
than Y oz) Me/(Leguj Mg + n*). If w’(L) were equal to its upper bound and nf} = nj, then 
e; would collapse into the simple mean of the y,; across the entire sample. This makes sense 
because when the full model in equation (4) is correct and o” = 0, the most efficient estimator 
of w + 7; = pis the full sample mean. 

Ifn* < njandL = 0, however, then e; will be calculated with more weight given to units 
outside of domain / than to units inside the domain, which makes little sense. One ad hoc way 
to get around this phenomenon is to set an upper bound of 1 — (n ;/ Ng) (or smaller) on 
a’(L). Another approach would be to abandon small domain estimation entirely when a’(L) 
as calculated in the text exceeds 1 — (n;/Yn,). Note that L, the estimated value for 07/5”, 
would have to be very small for this to happen. In the empirical study discussed in the previous 
section, L was in the 0.03 to 0.06 range, yet a’(L) was always well below 1 — (tt Ja nig a) 2 

There are two ways the full model in equation (4) may fail. The fixed effects model within 
each domain (equation (1)) can fail or the linking model in (3) can fail. In the real world, both 
models are likely to be wrong. Equation (1) for its part ignores stratification and clustering 
effects as well as any subtle effect of membership in a household with more than one woman 
in the same age group. None of these effects are likely to be great. Moreover, by incorporating 
sampling weights into the estimate d; and forcing the mean squared error estimators to be 
design consistent, we have done as much as we can do to protect ourselves against the poten- 
tial for model failure in equation (1). 

On the other hand, we should have little faith in the viability of the linking model. It is hardly 
more than a statistical convenience that, among other things, fails to allow for any correla- 
tion in the intakes of women from the same region but from different levels of urbanization 
or vice versa. 

As noted, simply counting the number of times v(e;) — v(d;) is negative provides a means 
for choosing between the estimators d; and e; that is independent of the linking model. The 
estimator v(e;) is unstable, however, and should not be used by itself as an estimate of mean 
squared error in practice. 

Not only are the estimates of the mean squared error of e; unstable, the v(d;) are only 
slightly better. At best v(d;) has ‘“‘degrees of freedom’’ equal to the number of PSU’s minus 
the number strata in j. For the CSFII sample, these range from 2 to 7. 

Since it is becoming increasingly necessary for statisticians to provide estimated standard 
errors along with the estimated means they publish, it is imperative that more stable estimators 
than v(d;) and v(e;) be found. One idea might be to fit the v (d;) and the v(e;), either together 
or separately, with a variance estimating function. This approach is ad hoc, however, and may 
do little more than return values close to fully model-dependent estimates of the mean squared 
errors of the d; and e; (see Prasad and Rao 1986, for a good discussion of these) by ‘‘averaging 
out’’ the effects of model failure. 

One intriguing idea is to combine the stable, but biased, model-dependent mean squared 
error estimates with the design consistent estimates developed here, much like e; does for 
means. How this should be done is a topic that deserves future attention. 
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Estimation of Livestock Inventories Using 
Several Area- and Multiple-Frame Estimators 
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ABSTRACT 


Estimation of total numbers of hogs and pigs, sows and gilts, and cattle and calves in a state is studied 
using data obtained in the June Enumerative Survey conducted by the National Agricultural Statistics 
Service of the U.S. Department of Agriculture. It is possible to construct six different estimators using 
the June Enumerative Survey data. Three estimators involve data from area samples and three estimators 
combine data from list-frame and area-frame surveys. A rotation sampling scheme is used for the area 
frame portion of the June Enumerative Survey. Using data from the five years, 1982 through 1986, 
covariances among the estimators for different years are estimated. A composite estimator is proposed 
for the livestock numbers. The composite estimator is obtained by a generalized least-squares regres- 
sion of the vector of different yearly estimators on an appropriate set of dummy variables. The com- 
posite estimator is designed to yield estimates for livestock inventories that are ‘‘at the same level’’ as 
the official estimates made by the U.S. Department of Agriculture. 


KEY WORDS: June Enumerative Surveys; Rotation sample; Composite estimator; Generalized least 
squares. 


1. INTRODUCTION 


The National Agricultural Statistics Service (NASS), formerly the Statistical Reporting Ser- 
vice, of the U.S. Department of Agriculture (USDA) conducts probability surveys in June each 
year (the June Enumerative Surveys) to obtain data on farming operations. The survey data 
are a critical input in the construction of the official estimates of livestock numbers, crop 
acreages, grain stocks, efc. for the different states and for the United States as a whole. The 
sampling units in the farm surveys are selected from area frames and from list frames. 

The area frame for a given state is the geographic area of the state stratified according to 
land use. The strata are defined by the percentage of the area that is cultivated, and whether 
the area is mainly urban, woodland, lakes, or other nonagricultural land. The sampling units 
for the area samples are called ‘‘segments’’, which vary in size in different states and strata, 
but are approximately one square mile in rural areas. 

For the estimation of livestock inventories, samples of farm operators are also drawn from 
lists of farmers who raise the particular livestock. These list frames are stratified by measures 
of size. The area-frame and list-frame survey data are combined to obtain multiple-frame 
estimators for livestock numbers at the state level. 

Different estimators can be constructed from the area-frame and list-frame samples. Statisti- 
cians within a state office of NASS calculate several estimators and make a recommendation 
for the official estimate of the number of livestock in the state. These materials are sent to the 
Agricultural Statistics Board within NASS in Washington D.C. The Board considers the dif- 
ferent sample estimators, the recommendations of the state office, industry data, regional-level 
summaries and balance sheets when constructing the official estimates. Charting techniques 
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to maintain historical relationships among the data sources are also used by the Board. The 
Agricultural Statistics Board sets official estimates so that the official state estimates sum to 
the official national estimate. 

A major drawback of the present procedure of establishing the official estimate is that there 
is no statistical measure of precision available for the official estimate. In 1983, a long-range 
planning group within NASS recommended that an objective procedure be developed for com- 
bining the different probability-based estimators into a composite estimator for the official 
estimate [see Allen, ef a/. 1983]. In 1984 it was recommended that a composite estimator should 
be made available for the consideration of the Agricultural Statistics Board [see Bynum, et 
al. 1985, p. 2]. 

The pooling of data from different, but related, samples and the combining of two or more 
estimators has been a subject of statistical research for many years. Some of this research is 
cited by Kuo (1986). Kuo also considers a composite estimator for livestock inventories based 
on USDA survey data. 

In this study we investigate a procedure for constructing a composite estimator for livestock 
numbers. The values of several estimators for livestock inventories for a number of years and 
the variances and covariances among estimators for the different years are used in the con- 
struction. Assuming that the relationships among these estimators are defined by a simple linear 
model, we obtain the generalized least-squares estimator for the livestock inventories in the 
last year of sample data. Because the time-series of estimates is important, the set of composite 
estimators is constrained such that the average of the estimates for all years prior to the cur- 
rent year is equal to the average of the corresponding official estimates. This preserves the level 
of the time-series relative to previous official estimates. Alternative level constraints could be 
imposed. 


2. AREA- AND MULTIPLE-FRAME ESTIMATORS 


In the area-frame June Enumerative Survey, sample segments are identified on maps and 
all farm operators who have farming activities within these segments are identified and inter- 
viewed. The interviewers determine whether or not the farm operators in a given sample seg- 
ment have their residences located within that segment. An area (or a collection of areas) of 
land within a sample segment that is under one type of management arrangement is called a 
“‘tract’’. A tract may be part of a farm or an entire farm. 

The interviewer obtains information on the farming operation for each tract within a sample 
segment, including the size of the tract. In addition, information is obtained on the total far- 
ming operation of each sample farm operator. This information can be used to construct three 
different estimators of totals. The three estimators are called the closed-, open- and weighted- 
segment area-frame estimators. They differ mainly in the way in which farm values are 
associated with the segment. 

The closed-segment area-frame estimator uses values associated with the operation of each 
tract within a sample segment. The open-segment area-frame estimator uses the values for the 
entire farm operation for those farms whose operators have their residences within the sample 
segment. The weighted-segment area-frame estimator uses values for the entire farm opera- 
tion for farms with tracts in the sample segment. The values are prorated to the tract level by 
multiplying the farm total by the proportion of the total farm area that is within the sample 
segment. The weighted-segment value for a segment is the sum of the prorated values summed 
over all tracts within the sample segment. The closed-, open- and weighted-segment area-frame 
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estimators of totals are defined by multiplying the corresponding segment values by their seg- 
ment weights (inverses of the probabilities of selection of the segments) and adding these values 
over all sample segments and strata within the state. The three estimators are compared and 
discussed by Houseman (1975) and Nealon (1984). 

The closed-segment area-frame estimator is considered to have a smaller variance than the 
open-segment area-frame estimator for most variables that can be easily reported on a tract 
basis. Items such as farm expenditures and livestock deaths are not easily reported at the tract 
level. The closed-segment area-frame estimator is preferred for estimation of national crop 
acreages and is also calculated, along with other estimators, for livestock inventories in most 
states. When values of variables can easily be associated with tracts, the closed-segment area- 
frame estimator is generally preferred because it is believed that the data obtained are less subject 
to reporting error by farm operators than information for the whole farm. 

The weighted-segment area-frame estimator generally has the smallest variance of the three 
area-frame estimators. The weighted-segment estimator can be used for estimation of the 
population total for any agricultural item. Nealon (1984, p. 19) cites several research studies 
which show that the weighted-segment area-frame estimator is biased because the total farm 
size is frequently underreported. It is generally believed that some areas in woodland, 
pastureland, idle land, and farmsteads are not reported as part of the farm. If so, the ratio 
of the tract area to the total farm area will be too large and the weighted-segment area-frame 
estimator will be positively biased. 

Multiple-frame estimators for livestock inventories use sample data from two or more 
frames. In the case of livestock, there are usually two frames, the area frame and a list frame. 
The list frame is a list of operators that were known, at one time, to have the livestock of interest. 
The list frame is incomplete but generally contains many of the large operators. For estima- 
tion of hog inventories in the study state, multiple-frame estimators are obtained by summing 
the estimator for the total of the list frame constructed with the list sample and an estimator 
for the nonoverlap domain (those operators not found in the list frame) from the area sample. 
The list sample is considered to be independent of the area sample. Different multiple-frame 
estimators are obtained when the closed-, open- and weighted-segment area-frame estimators 
are used for the nonoverlap domain. 


3. COMPOSITE ESTIMATOR 


We propose a composite estimator for the livestock inventory constructed under the assump- 
tion that a linear model defines the relationship among the different estimators. Suppose that 
N estimators for a given livestock inventory are available in each of T years and that the 
Agricultural Statistics Board has made official estimates for the first 7-1 years. It is assumed 
that a composite estimator for the livestock inventory in the 7-th year is desired. 

Let Y,; represent the i-th estimator for the ¢-th year, where ¢ = 1, 2, ..., T and 
i = 1, 2, ..., N. We assume the linear model, 


Yi = Q; = B; SF Cris (3.1) 


where a, is the livestock inventory for the f-th year; 
6; is the effect associated with the i-th estimator; and 


€,; is a random error which has mean zero. 
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The estimator effects, 6), 82, ..., By, are included to account for the fact that nonsampling 
errors may cause different estimators to have different expectations. Model (3.1) specifies the 
estimator effects to be additive and constant over years. The assumption of constant effects 
is a simple specification that is consonant with the data. 

The model (3.1) is a classical two-way, analysis-of-variance model, whose parameters are 
not estimable without additional model assumptions. To identify the parameters of the model, 
we restrict the average of the true livestock inventories in the first (7-1) years to be equal to 
the average of the corresponding official estimates of the Agricultural Statistics Board. This 
restriction is 


3 a= Na Gh; (3.2) 


where @, is the official estimate for the ¢-th year. This constraint forces the estimates of live- 
stock inventories to be at the same level as the previous official estimates. This is judged a 
reasonable constraint because actual values for a, cannot be obtained and the time series 
nature of the estimates is important. 

Given the restriction (3.2), the linear model (3.1) can be expressed in terms of the parameters, 


Cir Chy Adore a7 and Bi Bo, OE oS as 


T-1 
RO DAS We 

tee (3.3) 
Vy =. + 6; + ey 


where f= 2, 3, ..., T; and Yj; = Yj; — Y 1a gaat ap ones ih 


The model in matrix notation is 
Ye = Xy +428, (3.4) 


ae * * A 
where Y*¥ = CY 9 C alOg Yin> Yr); BOPOD Yon5 SEekt.s Yr, ced Ol Yr)’; 


X isthe(NT x K) matrix of dummy variables associated with 
the model (3.3), where K = T—1 +N; 


= (a, 3, «+05 AT Bi, Bo, Seles Bn)’3 and 


Ir 


e isthe N7-column vector of random errors, having covariance 
matrix, V. 


The covariance matrix, V, is the covariance matrix of the sampling errors, €,;, associated 
with the different estimation procedures. The estimators, Y,;, f = 1, 2, ..., T; i= 1, 
2, ..., N, are correlated within any given year because they are based on the same area segments 
and the same list sample. The estimators are also correlated among years because sample 
segments in the area sample are included in the surveys for several years, according to a rota- 
tion sampling scheme. The list sample is selected independently each year. The variances and 
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covariances of the estimators for any given year can be estimated by standard survey-sampling 
methods. Because the same list-sample estimator is used in defining the three multiple-frame 
estimators in a given year, the covariance between any two of the multiple-frame estimators 
in the same year will have a component due to the variance of the estimator obtained from 
the list sample. The covariances between estimators in different years, Cov(Y;;, Yj), where 
t # t’, can be estimated by standard methods, using the sample segments that are 
common to the two years. If it is assumed that the variances and covariances in V satisfy 
particular relationships, then these conditions can be imposed as part of the estimation 
procedure. 

Given an estimator of the covariance matrix, denoted by V, the estimated generalized least- 
squares estimator of the parameter vector, Ys is 


v heal Sil dorakes's ane 0G ean 45 (3.5) 
The covariance matrix of y is estimated by 
Coney SUX (3.6) 


The estimated generalized least-squares estimator, ar , which is the (7-1)-th element of 1 ; 
is a possible composite estimator for the livestock inventory for the T-th year. Its variance is 
estimated by the corresponding element of the estimated covariance matrix (3.6). Furthermore, 
the estimated generalized least-squares estimators, ar + ele i=1,2;..., N; are adjusted 
area-frame and multiple-frame estimators for livestock inventories in the TJ-th year which are 
based on the model (3.4). The variances of these adjusted estimators are estimated by obtaining 
the appropriate linear functions of the estimated covariance matrix (3.6). 

If the model (3.4) is true and the random errors have a normal distribution, then the weighted 
sum of squares, 


= (¥Y* — X¥)V-'(¥* — Xy), (3.7) 


has a chi-square distribution with parameter, NT — K. Thus the weighted residual sum of 
squares obtained by using the estimated covariance matrix yields an approximate test of the 
adequacy of the model (3.1). 


4. EMPIRICAL RESULTS 


4.1 Introduction 


In the USDA June Enumerative Surveys between 1982 and 1986, a total of 298 area segments 
were sampled in the study state. These segments were included in the June Enumerative Surveys 
according to a rotation sampling scheme in which approximately twenty percent of the segments 
are replaced each year. The actual replacement rate varies, but we construct estimators as if 
the rate was exactly twenty percent. 

The area frame for the state consists of eleven strata: nine strata are agricultural land, with 
varying percentages cultivated; one stratum is agri-urban land; and one stratum consists of 
residential or commercial areas. 
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The list frame for hog producers in the study state consists of eleven strata, which are defined 
by the total number of hogs raised by the farm operators at a particular time. The strata 
are defined by operations with: no livestock, livestock but no hogs, 1-99 hogs, 100-199 
hogs, ..., more than 6,000 hogs. The list frame for cattle that is sampled in June contains 
very large operators. It was a small list of less than 500 operators in each of the study 
years. The cattle list is divided into four strata. Three strata are defined by the total number 
of cattle and calves, where the strata are between 1,000 and 2,999, between 3,000 and 9,999, 
and more than 10,000. The fourth stratum is composed of farm operators with at least 200 
dairy cattle. 

The total number of farm operators in the area sample of the June Enumerative Surveys 
averaged about 2,350 during the years studied with a range of 120. The list sample for hogs 
averaged about 2,400 farm operators with a range of 100, whereas the list sample for cattle 
averaged about 70 farm operators with a range of 71. Using these data, the values of the closed-, 
open-, and weighted-segment area-frame estimators and the three corresponding multiple-frame 
estimators for the total number of hogs and pigs, sows and gilts, and cattle and calves were 
computed for each of the five years. The estimates where obtained by use of PC CARP, which 
is acomputer program for performing survey-sampling estimation on personal computers [see 
Fuller, et a/. 1986 and Schnell, et a/. 1988]. The variance estimators are the usual estimators 
for an estimated total constructed from a stratified cluster sample. See, for example, Cochran 
(1977). 

The data used for variance computations were treated as complete data although some data 
were imputed for nonresponse. The imputation, especially since the imputation methods draw 
heavily upon prior year data in the rotation scheme, may lead to an overestimate of the cor- 
relation between years. 


Table 1 


Estimates for livestock inventories in 1986 


Hogs and Sows and Cattle and 
pigs gilts calves 
Area-Frame Estimators 

Closed-Segment 18.42 15.78 13.27 
(1.97) (2.17) (1.53) 
Open-Segment 211 18.24 18.74 
(2.82) (2.69) (2:35) 
Weighted-Segment 21.69 18.85 15.48 
(1.67) (1.62) (1215) 

Multiple-Frame Estimators 
Closed-Segment 18.11 15.59 16.12 
(1.11) (1.28) (1.38) 
Open-Segment 18.06 13.29 19.97 
(1.26) (1.39) (2.08) 
Weighted-Segment 18.50 15.82 16.22 


(1.00) (1.00) (1.00) 
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Estimates for the livestock inventories in 1986 and the estimated standard deviations of the 
estimators for 1986 are given in Table 1. Each standard deviation in Table 1 is the square root 
of the average of the five estimated variances for the five years. The units in the table are deter- 
mined by coding the standard deviation of the weighted-segment multiple-frame estimators 
to be 1.00 for all livestock inventories. This makes comparison easy and also complies with 
confidentiality rules. 

As expected from previous studies [e.g., see Nealon (1984)], the open-segment area- 
frame estimator is the least precise estimator for livestock inventories. The most precise 
estimator is the multiple-frame estimator which uses the weighted-segment estimator for the 
nonoverlap domain. Coefficients of variation for the weighted-segment area-frame estimators 
are about 7% to 9%, whereas the weighted-segment multiple-frame estimators have coeffi- 
cients of variation of about 5.5% to 6.5%. Because the list sample for hog inventories 
is larger than that for cattle and calves, the precision of the multiple-frame estimators 
relative to the area-frame estimators is much greater for hog iventories than for cattle 
and calves. 


4.2 Estimation of Covariance Matrices 


The estimation of the covariance matrix for the six estimators for the five years of data pro- 
ceeded in several steps. The covariance matrix for the error vector, e, in (3.4) can be written 
in the form 


V=| V3, V3.2 V33 V4 V35 (4.1) 


where, for a particular inventory type, V,; is the 6 x 6 matrix of covariances between the six 
estimators for year ¢ and the six estimators for year 7. With the rotation scheme used, the 
covariance of estimators in two years is a function of the number of rotation groups common 
to the two years. Let k = |t — j| fork = 0, 1, ..., 4. Then the covariance matrix, V,;, can 
be estimated from the area segments of the 5 — k rotation groups which are common to the 
two years ¢ and /. 

We estimate the elements of the covariance matrix (4.1) imposing some additional assump- 
tions about its structure. Our primary interest is to compare the precision of the alternative 
estimators and this comparison is facilitated by the assumptions which follow. 

We assume that the covariance matrices for years that are the same distance apart are the 
same and are symmetric. This is, we assume 


Vij = Visrjtr 


and (4.2) 
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where V,; are the submatrices of (4.1); r = 0, 1, ..., max(5-#, 5-/) ;and4j = 1, Diet. 
For ¢ = j andr = 0, the assumptions of (4.2) imply 


Vi, = Vx = V33 = Vag = Vs5 = Vo. 
For ¢t # /, the assumptions of (4.2) imply the following: 


Vin = Vo3 = V3qg = Vas = VY, 
Vi3 = Vroq4 = V35 = Vo, 


and 


These assumptions are in reasonable agreement with the data. Good agreement was anticipated 
because the sample size is very stable over the five years and there were no large shifts in live- 
stock inventories. 

Weestimate the distinct submatrices of (4.1) by averaging the corresponding estimated cova- 
riance matrices obtained from common segments. The averaging process was based on the corre- 
lation matrices. Let the covariance matrix of the estimated totals defined in (4.1) be expressed as 


V=aSCS, 


where S$ is the 30 x 30 diagonal matrix of the estimated standard deviations of the six estimators for 
the five years and Cis the 30 x 30 correlation matrix, partitioned in the same manner as V of (4.1). 

The estimator of the correlation matrix C is constructed by averaging estimates of the sub- 
matrices of C. Using the segments common to two years, the covariance matrix of the two 
vectors of estimated totals constructed with those segments was estimated by the usual stratified 
cluster formulae. The estimated covariance matrices were converted to correlation matrices 
and these estimates were called the direct estimates. Let 


Co = (B)$(Ey + Cy + Cy3 + Cy + Css) 
C, = ($)E(Cin + Cy + Cy + Cys) 

Cy = ($)F(Cy3 + Cy + Gs) 

C; = (2)F (Cy, i Cys) 


Cy = (HC. 


where the Ci are the directly estimated correlation matrices based on common segments. The 
factors in parentheses represent the fraction of segments that are common to the estimates. 
This fraction arises from the rotation-sampling scheme in which twenty percent of the segments 
in the area sample are dropped from the sample each year and twenty percent new segments 
are added. By the independence assumption, the correlation between the segments rotated out 
and those rotated in is zero. 

Since the estimated correlation matrices, Cis are not symmetric when ft # /, the symmetric 
assumption, V,; = Vij, in (4.2) is imposed on the estimated covariance matrix by defining 


C= Vie e see 1h 4. 
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Let § be the 6 x 6 diagonal matrix of the square roots of the average estimated variances of 
the six estimators, where the average is over the five years. Again, for confidentiality require- 
ments, the estimated variances are standardized such that the estimated variance of the 
weighted-segment multiple-frame estimator is equal to 1.00. Then the estimated covariance 
matrix for the six estimators for the five years is 


ae 
MaeETO 00 (Cipalan! oer ( obra Si at 6D s 0 00 0 
Oia SadOtkO a0 G al.Gu dGraerCa 0s 0 0 0 
0 OS 00 COIN COM OLE; geo 8s" 107 RO 
000 $0 ChaGe RCn ne fae One08c0sSeH0 
Gero Os CEP CERO CA UCR Gevonico moh. s 


(4.3) 


The estimated covariance matrices, vy = s Cx s , for the livestock inventories few 
in Table 2. The estimates of the four unique off-diagonal submatrices, V , = ie 
r = 1, 2, 3, 4, are available from the authors on request. 


Table 2 


Estimated covariance matrices for the six 
estimators of livestock inventories within a year 


Area-Frame Estimators Multiple-Frame Estimators 


Closed Open Weighted Closed Open Weighted 


A. Hogs and pigs 


3.886 4.077 2.366 0.654 0.688 0.405 
4.077 7.959 2.394 0.698 1.150 0.430 
2.366 2.394 2.784 0.373 0.409 0.481 
0.654 0.698 0.373 1.242 1.239 0.936 
0.688 1.150 0.409 1.239 1.590 0.937 
0.405 0.430 0.481 0.936 0.937 1.000 
B. Sows and gilts 
4.720 4.274 2.455 1.102 e112 0.572 
4.274 7.260 2322 1.119 1.427 0.548 
2.455 Devo 2-021 0.481 0.487 0.499 
1.102 1.119 0.481 1.638 1.658 1.033 
P12 1.427 0.487 1.658 1.934 1.033 
02 0.548 0.499 1.033 1.033 1.000 
C. Cattle and calves 
DOD» 1.951 1.141 1.853 1.655 0.907 
1.951 5.02), 1.014 1.652 4.418 0.912 
1.141 1.014 1.321 0.913 0.891 0.925 
1.853 1.652 0.913 1.910 1.756 0.992 
1.655 4.418 0.891 1.756 4.310 1.017 


0.907 0.912 0.925 O:992 1.017 1.000 
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Consider a sample composed of a common set of rotation groups observed in each of the 
five years, rather than the existing sample in which twenty percent of the sample segments are 
dropped each year. For the sample with no rotation, the covariance matrix of the six estimators 
for the five years, expressed in terms of the submatrices of (4.1), is 


Vy eV EV 3. 2 hu @ FY; 
+ Voy Von Vag F Vg ¥ V 05 
+ Va + Vag V3, = Vag & Vs (4.4) 
> Vy ¢V¥qQ Vy Vag © V5 
5 Vs) + Vs. 2 V53 3 Vea Vs5 


Direct sample estimates of the submatrices, V,;, obtained from segments common to years ¢ 
and / sometimes gave a covariance matrix (4.4) that was not positive definite. For example, 
this can happen if operators with very large holdings are among those operators in the one 
rotation common to all five years. When the assumptions of (4.2) are imposed in the estima- 
tion process, the estimates of the covariance matrix (4.4) were positive definite for all three 
livestock inventories. 


Table 3 


Composite estimates for the livestock inventories in 
1986 and the effects for different estimators 


—_—_e_e————————— 


Hogs and Sows and Cattle and 


pigs! gilts! calves! 
i 
Composite Estimator 18.84 18.06 16.43 
(1.01) (1.02) (1.03) 
Effects of Area-Frame Estimators 
Closed-Segment —1.13 — 2.26 —0.21 
(1.30) (1.36) (0.99) 
Open-Segment 0.26 — 1.09 1.03 
(1.86) (1.78) (1.45) 
Weighted-Segment 1.24 — 0.94 — 0.26 
(1.14) (1.10) (0.80) 
Effects of Multiple-Frame Estimators 
Closed-Segment — 0.33 — 1.86 0.04 
(0.66) (0.78) (0.92) 
Open-Segment —0.11 — 1.82 1.40 
(0.75) (0.84) (1.32) 
Weighted-Segment 0.19 — 1.74 —0.31 
(0.59) (0.59) (0.69) 


1 Standard errors are given in parentheses. 
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4.3. Model Estimation 


Given the estimated covariance matrix, y , we estimate the parameters of model (3.4) by 
using the estimated generalized-least-squares estimator (3.5). The values of the composite 
estimator, aT, for 1986 livestock inventories are given in the first line of Table 3. The six 
estimator effects, denoted by 6; in model (3.3), are also given in the table. The estimated stan- 
dard deviation of the composite estimator is slightly larger than that of the weighted-segment 
multiple-frame estimator. The increase in variance comes from the fact that the level of the 
estimator is estimated using past sample estimates and past official estimates. 

The residual sums of squares defined in (3.7) were 18.22, 15.38, and 24.59, for hogs and 
pigs, sows and gilts, and cattle and calves, respectively. The degrees of freedom is 20 because, 
for each livestock inventory, there are thirty observations in the Y * -vector and ten parameters 
are estimated in y- In no case does the residual sum of squares exceed 31.41, which is the 95-th 
percentile for the chi-square distribution with 20 degrees of freedom. 

The composite estimator in Table 3 has nearly the same standard deviation as the weighted- 
segment multiple-frame estimator, the estimator with the smallest standard deviation (Table 1). 
Thus, one would expect the optimal linear combination of the six estimators for a single year 
to assign the majority of the weight to the weighted-segment multiple-frame estimator, and 
this is the case. The minimum variance weights for the data of a single year are calculated as 


(VF 571) 1p x1 


where: F?*=: (1)-1,'1,.143 1,4) and an is the covariance matrix of the six estimators given in 
Table 2 [see the diagonal elements of (4.3)]. The optimal weights and the estimated standard 
deviation of the optimal combination of the six estimators are presented in Table 4. Note that 
the sum of the weights is one for each livestock inventory. The difference between these stan- 
dard errors and those of the first line of Table 3 is due to the estimation of level in the con- 
struction of the estimates of Table 3. 


Table 4 


Optimal weights for six estimators in a single year. 


Inventory type 


Hogs and Sows and Cattle and 


Estimators pigs gilts calves 
Area-Frame 
Closed-Segment 0.0541 — 0.0152 0.0525 
Open-Segment — 0.0084 0.0152 0.0656 
Weighted-Segment 0.1463 0.1909 0.0909 
Multiple-Frame 
Closed-Segment 0.1640 — 0.0218 — 0.0353 
Open-Segment — 0.0116 — 0.0191 — 0.0772 
Weighted-Segment 0.6556 0.8500 0.9035 


Estimated standard error 
of optimal combination 0.94 0.95 0.99 
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Table 5 


Estimated correlation coefficients between the weighted-segment 


area-frame estimators based on a common rotation group 
be ASE een ana ie Cee Reds a a ee ed 


h Hogs & pigs Sows & gilts Cattle & calves 
0 1.000 1.000 1.000 
1 0.606 0.590 0.592 
2 0.478 0.456 0.433 
3 0.365 0.336 0.258 
4 0.304 0.217 0.097 


4.4 Estimation Using Rotation Group Means 


In obtaining the estimates of Section 4.3, we did not use all of the available information. 
We used the estimators for each year, but did not decompose the estimators into the parts 
associated with each rotation group. In this section we construct an estimator using the indi- 
vidual rotation group means of the weighted-segment area-frame estimator. We retain the 
assumption that the variance of the estimator is the same across years. Under that assump- 
tion, the correlation coefficients are assumed to depend only on the number of years between 
the estimators involved. Let p, represent the correlation coefficient between the weighted- 
segment area-frame estimators for acommon rotation group which is observed h years apart, 
h = 0,1, ..., 4. For the three inventory types, the estimated correlation coefficients are giv- 
en in Table 5. The estimated correlation coefficients between estimators for h years apart are 
the averages of the correlation coefficients estimated from the 5 — hrotation groups involved. 
There are a total of nine rotation groups for the five years. 

Let Z,; represent the weighted-segment area-frame estimator in rotation group / for year 
i,whereji= f+, .:.,f+4andte= 1,2, -,..5. 1. hen: for a given year, ft, we assume that 
Z,; is an unbiased estimator of the unknown total inventory, «;. It is known that a rotation 
group bias may exist and need to be estimated, but we ignore that effect in this illustration. 
The model is 


VARIN RS Sk rae eas ay eatin. 
i Aalblaiahinisaid (4.5) 
Tate 1, . ened, 


where the errors, € tj » have zero mean. The model (4.5), in matrix notation, is 


where 


D = I; & 1s, I; is the identity matrix of order 5 and 1;is the (S x 1) vector with all elements 
equal to one. 
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Let the correlation matrix for the rotation-group estimators, Z, be 


W; W, 
W, Wo W, Wr W; 
W=|W, W, W W, W; 
W; W, W, W WwW, 
WAG LW: Si Wo 


where Wp) = JI; , 
0 p» 000 0 0p,0 0 
0 Op, 0 0 0 00 pp, 0 

W, = OneO0 50 foi50. |, W,= 0 00 0p |, 
O00" OZ p; 00000 
0,=0-0--0..0 00000 
0 0 0 p3 0 Ons002 07D, 
000" OF p3 00000 

W; = UmeUZO0s07 0... W,= O35 0.02050 
00000 U0, O7080 
0.7050 200 00000 


Then, the generalized least-squares weighted-segment area-frame estimator is 
& = (D'W"'D)"'D'wW"'z, 


where W is the estimator for the correlation matrix, W. The covariance matrix of & is 
estimated by 


Cov(&) = (D’¥ ~'D)=', 


~ 


where s = 5Wis the covariance matrix of Z , whose units are such that the estimated variance 
of the weighted-segment area-frame estimator is one. The estimated covariance matrices, 
Cov(@), for the three livestock types are given in Table 6. We see that the estimators obtained 
using the individual rotation group estimates are about 10% more efficient than the weighted- 
segment area-frame estimators for 1986. 

The optimal weights for the vector of individual rotation estimates are 


(D'W-'D)-'D'w-! . 


The weights are available from the authors. 
The generalized least squares procedure can be applied to other combinations of rotation 
group and year estimators, but the results suggest that additional gains would be modest. 
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Table 6 


Estimated covariance matrices for weighted-segment area-frame 


estimators using information in the rotation scheme 
nee I Re re 


1982 1983 1984 1985 1986 
Hogs and pigs 
1982 0.899 0.436 0.283 0.180 0.124 
1983 0.436 0.857 0.412 0.273 0.180 
1984 0.283 0.412 0.844 0.412 0.283 
1985 0.180 02275 0.412 0.857 0.436 
1986 0.124 0.180 0.283 0.436 0.899 
Sows and gilts 
1982 0.908 0.429 0.272 0.167 0.099 
1983 0.429 0.866 0.405 0.262 0.167 
1984 O272 0.405 0.853 0.405 0.272 
1985 0.167 0.262 0.405 0.866 0.429 
1986 0.099 0.167 Deets 0.429 0.908 
Cattle and calves 
1982 0.914 0.438 0.264 O.b35 0.061 
1983 0.438 0.870 0.412 0.253 0.135 
1984 0.264 0.412 0.856 0.412 0.264 
1985 0.135 0.253 0.412 0.870 0.438 
1986 0.061 0.135 0.264 0.438 0.914 


5. CONCLUSION 


The composite estimator suggested in this paper provides a method for combining the values 
of several estimators for livestock inventories. The composite estimator uses the values of the 
different area-frame and multiple- frame estimators in several preceding years, as well as the 
values in the year for which the official estimate is sought. The optimal linear combination 
of the six estimators within a particular year has a variance that is two to twelve percent less 
than that of the weighted-segment multiple-frame estimator. Including the estimators from 
the other four years produces an additional reduction of one to two percent in the variance 
of the composite estimator for the current year. The data required to calculate the weighted- 
segment multiple-frame estimator are those required for the other five area- and multiple-frame 
estimators. The greatest effort required in constructing the composite estimator is the estima- 
tion of the covariance matrix for the estimators over the years in which sample data are 
available. Because the variances are relatively constant over years, the weight vector can be 
calculated in advance and applied to the estimates of the current year. Then the marginal ef- 
fort required for the composite estimator during the estimation year is very small. 


ACKNOWLEDGMENTS 


This research was partially supported by Research Agreement No. 53-319T-6-00073 with 
the National Agricultural Statistics Service of the U.S. Department of Agriculture. The authors 
thank Ron Fecso and Vic Tolomeo for their assistance. Ron Fecso’s comments on an earlier 
draft are gratefully acknowledged. The work was conducted when the first author was on study 
leave at lowa State University. 


Survey Methodology, June 1989 aT 


REFERENCES 


ALLEN, R., CLAMPET, G., DUNKERLEY, C., TORTORA, R., and VOGEL, F. (1983). Framework 
for the Future. Statistical Reporting Service, U.S. Department of Agriculture, Washington, D.C. 


BYNUM, H., DOWDY, W., HANUSCHAK, G., HUDSON, C., MURPHY, R., STEINBERG, J., and 
VOGEL, F.A. (1985). Crop Reporting Board Standards. Statistical Reporting Service, U.S. Depart- 
ment of Agriculture, Washington, D.C. 


COCHRAN, W.G (1977). Sampling Techniques. New York: Wiley. 


FULLER, W.A., KENNEDY, W., SCHNELL, D., SULLIVAN, G., and PARK, H.J. (1986). PC CARP. 
Statistical Laboratory, Iowa State University, Ames, Iowa. 


HOUSEMAN, E.E. (1975). Area Frame Sampling in Agriculture. SRS Report No. 20. Statistical Report- 
ing Service, U.S. Department of Agriculture, Washinton, D.C. 


KUO, L. (1986). Composite Estimation of Totals for Livestock Surveys. SF & SRB Staff Report No. 
92. Statistical Reporting Service, U.S. Department of Agriculture, Washington, D.C. 


NEALON, J.P. (1984). Review of the Multiple and Area Frame Estimators. SF & SRB Staff Report No. 
80. Statistical Reporting Service, U.S. Department of Agriculture, Washington, D.C. 


SCHNELL, D., KENNEDY, W.J., SULLIVAN, G., PARK, H.J., and FULLER, W.A. (1988). Personal 
computer variance software for complex surveys. Survey Methodology 14, 59-69. 


ae i 
7 a. a or, At 


> sb oye k , %, j 
2A) onesniey, srutliniyad, 
ao Ot? See 


A 
v 


shogsa iaueien 1 oe eran ann! war anh 


Llane dol te AL’ 
iV vos me ase am be wont ea cig } 
sg CY eceigmadinr Uae gemma ied 


tA. 
: 016 Toate hart a2 ar , | rok Bi. wien vce ' ’ (81 “1 
aa "3.G rengiitias W Srpadiigk welt I TOA UE 
inal (8 PASI bot HRN 0, AMELIA WE 
7 (GH-RE GT WARSI ANA, « eat ecovie Pakithex 10 


yy 


eo 
: eel | ti, 246 0.4)? 

: jn a dar 697 *! 
4! (Lae p.BAS 5.34 >. 438 

7 = —_ vl © me ~ SS  -——— EE — 
a 
a p 4s a j 
puee lk taieriotie agers | ee orenbaabl mai 


Pec eeral enti 7 ’ . <% nt 2 re - © eistive< ae ne 


4y ij {44 ae { rile a bs a waka ¢ PONT. pie a 

Vi ies Wi tbo weay Fay echicls the oe os yo Oy. Dhe.opei meal iiZer one 

Lie Gin cation® ,iLAi & DAT Aw : a] cs eal Ie 1D fears 

hep “Agei 0 bb? Pash iec-acy ew Lie ree Epes. 7, anlar e (he fetisnasort 

7 nie ri ils SOB ait! ve er a ae ma 7,98) (eS Ares oe 
Meds a ee a ¢ Drmhvesoansd uaicerD 


OV. ve r¢€ ) q oil bt mee & sac | "uu com Py rate 


mn ‘ lid 4 Cities = ie ( Rif <* smeatcr te ae excl 


» 
ms 12 1 & Be > “Reb | tech cantpie dala. cet} 


pies : . ‘ mee(esiceir « rt Nee ‘er vector be 
* oe we ape ae Gal ioey et | ithe cy il SER. Tien (ie aan 


Br leigerdis otinay  Ovring the ettiniaiing yor fe Very unall aT 


~ 5 > o 4 Va] a” a5 at 
‘ m , be. Ke a @ a qe 
ws te Sieve if aie ,, (2a eae Amp ie 
fan " { =) Gy 4 P ameer Pres aca 
a! oa ary e o aad oy wash es teurtce? ‘4.7 tole aa) 


Survey Methodology, June 1989 29 
Vol. 15, No. 1, pp. 29-45 
Statistics Canada 


Modelling and Estimation for Repeated Surveys 
D.A. BINDER and J.P. DICK! 


ABSTRACT 


Estimation of the means of a characteristic for a population at different points in time, based on a series 
of repeated surveys, is briefly reviewed. By imposing a stochastic parametric model on these means, it 
is possible to estimate the parameters of the model and to obtain alternative estimators of the means 
themselves. We describe the case where the population means follow an autoregressive-moving average 
(ARMA) process and the survey errors can also be formulated as an ARMA process. An example using 
data from the Canadian Travel Survey is presented. 


KEY WORDS: Kalman filter; Overlapping surveys; State-space models; Time series modelling; Small 
area estimates. 


1. INTRODUCTION 


When surveys with similar data items are conducted on repeated occasions, certain estima- 
tion and data analysis methods are available which are not possible with single occasion surveys. 
For example, efficient estimation methods for the current occasion can depend on data from 
previous occasions. This occurs when there are overlapping sampling units between occasions 
and, hence, the survey errors can be correlated over time. As well, the series of estimates from 
a repeated survey are often modelled by the data users. A common example of this is to assume 
an autoregressive-moving average (ARMA) model. However, most existing procedures for 
estimating the unknown parameters of this model assume that the input data are not subject 
to survey error. 

In this paper we develop procedures for estimating these model parameters when the data 
contain survey errors. The covariance structure of the survey errors we consider include some 
cases where the survey errors are correlated over time. 

When such a model for the behaviour of the population characteristics is assumed, the 
minimum mean squared error (MMSE) linear estimator can be derived. This estimator incor- 
porates the model structure which the classical minimum variance linear unbiased estimator 
(MVLUE) ignores. The MVLUE is discussed in Section 2. 

Blight and Scott (1973), Scott and Smith (1974), Scott, Smith and Jones (1977), R.G. Jones 
(1980) and others considered the implications of such stochastic models for the population 
means over time. These results and a more general formulation using state-space models and 
Kalman filters are discussed in Section 3, for the case where the stochastic model for the popula- 
tion characteristics is completely specified. These methods can be developed in a setting which 
is equivalent to a Bayes formulation, where the prior distribution is completely specified. 

When the assumed model is an ARMA process in the presence of survey errors, the state- 
space formulation can be used to derive the maximum likelihood estimates of the unknown 


! D.A.Binder and J.P. Dick, Social Survey Methods Division, Statistics Canada, 4th Floor, Jean Talon Building, 
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parameters. We note that this approach can be viewed as empirical Bayes. We assume that 
the survey errors can be described through an ARMA process up to a multiplicative factor. 
This is discussed in Section 4. 

An example of this model is described in Section 5 using data from the Canadian Travel 
Survey. This example shows the implications on the estimates of the model parameters when 
the survey errors are taken into account. We also derive a smoothed estimate of the underlying 
process under the model assumptions. In this example, the survey errors are independent, so 
that the full machinery of the general formulation in this paper is not required. However, the 
example demonstrates that the impact of ignoring the survey errors even in this case can be 
appreciable. 

Section 6 contains some concluding remarks. 


2. MINIMUM VARIANCE LINEAR UNBIASED ESTIMATION 
IN OVERLAPPING REPEATED SURVEYS 


In this section we briefly review the literature for the case where the population values of 
a characteristic such as a mean or total are taken as fixed unknown constants. In Section 3, 
we study the case where a stochastic model is assumed for the population characteristic. 

In overlapping surveys, where the same individual provides responses on repeated occasions, 
the sampling errors between occasions are usually correlated. Correlations can also occur in 
a multi-stage survey where some of the first stage sampling units overlap, even though the 
ultimate respondents differ. 

Estimators which ignore these correlations and use only the data collected in the single 
reference period are in general inefficient relative to the minimum variance linear unbiased 
estimator (MVLUE). The relative efficiency depends on the size of the correlation of the 
sampling errors between occasions. When the correlations are zero, as in our example in Sec- 
tion 5, the MVLUE is simply the estimator based on data from a single reference period. 

Jessen (1942) was the first to incorporate the overlapping information from the same indi- 
vidual on two successive occasions. Patterson (1950) provided a general theory for repeated 
surveys with overlapping units. He considered in detail the special case of simple random 
sampling from an infinite population, where the correlation for individuals is exponentially 
declining in time lag. On each occasions, a sample of individuals is removed from the sample 
of the previous occasion and a sample of individuals is added. All data are collected with 
reference to the current occasion only. Patterson derived the MVLUE for this setup. 

Extensions have been made to the basic assumptions of Patterson (1950). Eckler (1955) called 
Patterson’s design one-level rotation sampling. Eckler derived the MVLUE when individuals 
report for two successive time periods, which he termed two-level rotation sampling. He also 
derived the MVLUE for surveys with higher order rotation sampling designs. 

Rao and Graham (1964) relaxed the infinite population assumption by incorporating the 
finite population correction factor into the variances of the survey error. Singh (1968) was the 
first to consider multi-stage designs. He examined two-stage sampling with the assumption that 
the correlation between responses on different occasions can be considered in two parts: (1) 
the correlation between second stage units (SSU’s) within primary sampling units (PSU’s) and 
(ii) the correlation between PSU means on successive occasions. If both of these correlation 
patterns are assumed to be that of a first order autoregressive process, then the form of the 
MVLUE follows the general form given by Patterson (1950). 
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Tikkiwal (1979) and others considered the implications of relaxing the assumption of a first 
order autoregressive correlation pattern. Tikkiwal concluded that if a completely general cor- 
relation structure is assumed, the simple form of the MVLUE is lost and approximations must 
be used in practice. Rao and Graham (1964) and Gurney and Daly (1965) proposed the use 
of composite estimators which are approximations to the optimal estimators. These estimators 
are easily implemented and have high relative efficiency. For a discussion on the use of these 
estimators, see Binder and Hidiroglou (1988). 

Gurney and Daly (1965) also generalized the results of Patterson (1950) to a linear model 
framework. They introduced the concept of an ‘‘elementary estimate’’. This is an estimate which 
uses data from a specific time period, based on individuals which all join and leave the survey 
at the same time. The expected value of these elementary estimates can be expressed as a linear 
combination of the population parameters, {6,}. When the correlation structure is known, 
standard general linear model theory can be used to derive the MVLUE. 

To formalize this discussion, let y,; be the j-th elementary estimate from the f-th time 
period, where E(y,;) = 0,. If Yand © are vectors with components y,; and 6, respectively, we 
can write: 


Y= X'O +e, (2.1) 


where _X is a fixed (n x 7) matrix of 0’s and 1’s, E(e) = Oand E(ee’) = U, whichis the 
known variance-covariance matrix of the elementary estimates. Thus, the MVLUE is given by: 


OS OU XY EAs TY: (2.2a) 
with 
Var(6) = (X’U"'X)7'. (2.2b) 


These results imply that every new survey would require the updating of all previous 
estimates. However, since estimates from the earlier occasions often have a much smaller effect 
than the recent occasions, composite estimates, such as proposed by Gurney and Daly (1965), 
are simpler to use and have a high relative efficiency. Binder and Hidiroglou (1988) discussed 
the appropriateness of these methods and their application in a number of surveys. In gen- 
eral, they found that good results can be achieved using composite estimators, providing the 
rotation group biases are not substantial. 


3. SIGNAL-NOISE EXTRACTION 


It is quite common for economists and sociologists to treat the underlying parameters, {0,}, 
as random inputs for their stochastic models (Smith 1978). However, if the sampling errors 
associated with the input data are ignored, the estimates of the parameters of the stochastic 
model are biased. 

In this section, we show how the stochastic model assumptions can also be used to obtain 
model-dependent, design-consistent estimators. In Section 4, we discuss maximum likelihood 
estimation of these parameters. Since misspecification of the model could lead to serious biases, 
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hypothesis testing methods should be used to check the consistency of the model with the data. 
The model should also reflect the subject matter knowledge of the underlying phenomenon. 

First we consider the case where the survey errors are independent. (This would be approx- 
imately true for non-overlapping surveys with small sampling fractions.) In this case, the 
MVLUE for 6, is 6, = y,;. However, by imposing a stochastic model for the sequence of 
parameters, {6,}, an improvement in the mean squared error of the estimate can be achieved. 

Scott and Smith (1974) proposed the following model for non-overlapping surveys. They 


wrote the model for the survey estimates at time ¢ as: 
vy =O, 4+ e ees 


where the e,’s are independent N(0,S?). They assumed that the sequence of parameters, {6,}, 
can be modelled such that, conditional on @/_,; = (6, ... 6;_1), 


6, = a; O,_; + &; (3.2) 


where the e;’s are independent N(0,S?) and independent of {e,}, and a, is a (t—1) 
dimensional vector of constants. 

In®general, dt. tiie) P'S cofiditionall on¥ V2 ) = iGjpar.t}, Yr-1), we have 0,_,~ 
N(6,_1,V;_1). Conditional arguments then yield 


E(61y:) = 8; = (o/O,-1) + 1-1) (3.3a) 
and 
Var (6;|¥,) = (1—7,)S?, (3.3b) 
where 
— Var) _ Sr (3.3c) 


Nala grnapl eapnape nightie ab se 


Note that the estimator in (3.3a) is a weighted average of two components. The first con- 
sists of the best linear forecast of 6, given the previous value of 6,_ 1; the second consists of 
the best estimate of 6, from the survey. The contribution of each term is controlled by ™,, the 
ratio of the survey variance to the total variance. As the survey error component becomes small, 
then the contribution from 6,_, becomes small and the estimate of 6, in (3.3a) is composed 
primarily of y,, the estimate from the survey data. Therefore, the estimator of 6, is design- 
consistent whenever y, is design-consistent. 

However, as the survey error component becomes large, the estimate of 6, is due primarily 
from the linear forecast of @,_;. The relative efficiency of the estimator, 6,, in (3.3a) is given 
by 1/(1—7,), where 7; is defined in (3.3c). The greatest efficiency gains occur when the survey 
error is large relative to 07, the variance of the ‘‘shocks’’ of the model process. 
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Scott and Smith (1974) and R.G. Jones (1980) also considered the case of overlapping 
surveys. Jones’ formulation for this case was as follows. Let @, be multivariate normal with 
mean zero and variance matrix V*. Now the observations at time ¢ may be generalized to a 
vector of elementary estimates, y,. The conditional distribution of Y,; = (yj, ... ,y/)’ given 
Q, is assumed to be of the form: 


Y, = X;9; + Cr, (3.4) 


where X; is a fixed matrix of 0’s and 1’s linking the parameters and the observations, and e, is 
the survey error, assumed to be multivariate normal with mean zero and covariance matrix U,. 
Using conditional arguments, the best estimate of 6, given Y, is: 


EQOiY) = Op — 9X7 Un Xe ee aX, Lie, (3.5a) 


with a variance of 


Van(0,|.¥:),=a¢x1 UX) +10 +) th (3.5b) 


This result is very general. If we allow the underlying stochastic model for 0, to be very dif- 
fuse, then the inverse of V¥ is approximately zero, thus yielding the MVLUE given by (2.2a). 
R.G. Jones (1980) derived (3.5) by application of stochastic least squares, so that the estimator 
6, is the minimum mean squared error (MMSE) linear estimator, even when the normality 
assumptions are dropped. 

Applying (3.5) directly would involve inverting matrices which have the same dimensionality 
as the vector of all the elementary estimates for all time periods. Computing such inverses can 
be numerically unstable. However, expression (3.5) can often be restructured using state-space 
models, which are useful for describing many time series models. See Harvey (1984) for a review 
of such models. As we demonstrate below, this would avoid the inversion of large matrices. 
Some structure for {6,} and {e,} would be required to take advantage of the reduction in 
dimensionality afforded by the state-space approach. An example of such a structure, which 
is often used in time series applications, is an autoregressive-moving average (ARMA) process, 
not necessarily homogeneous in time. 

For applications such as small area estimation, where the sample size is not large, modelling 
the variances of the survey error, U;, using such ARMA models can be useful. This is not 
usually done for repeated surveys. This would also alleviate the problem of applying the result 
in (3.5) directly when the dimensions of V* and U, are large and the inverses are numerically 
unstable. 

In the state-space model, two processes occur simultaneously. The first process, the observa- 
tion system, details how the observations depend on the current state of the process parameters. 
The second process, the transition system, details how the parameters evolve over time. 

State-space models can be written as follows. The observation equation is written as: 


VY. = Ay zy + &, (3.6a) 
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and the transition equation is written as: 
Zp = Fi 2% + G, €;, (3.6b) 


where z,is an (r X 1) state vector, H, is a fixed (mn, X r) matrix, F,is a fixed (r Xx r) tran- 
sition matrix, G; is a fixed (r X m) matrix and w, and e, are independent random disturbances 
with mean zero and covariances given by E(w, w/) = U,and E(e,¢/) = V;,. 

As an example of this formulation, we rewrite the model studied by Blight and Scott (1973) 
in terms of the state-space model. Blight and Scott considered data from Patterson’s (1950) 
one-level rotation design. They let 9,’ be the mean of the new units at time ¢, and ’ and x/ 
the means of the overlapping units at times ¢ and t—1, respectively. They assumed that j/” and 
Yi — pxX; are independent observations at time ¢, where p is the between-occasion correlation 
of the responses from the same individual. They also assumed that the mean process {6,} is 
first order autoregressive. 

We let the state vector be z/ = (6,, 0;_;). The observation equation can be written as: 


yy i a0 6 
Yt a t ai Wir 
VY; — px; L <p 6;-1 24 
where (w;, @,)’ has a diagonal covariance matrix. 
The transition equation would be written as: 
6 0 0. 1 
mle a dl BS e. 
6,4 nO a) 0 


where ¢, is N(0,o7). Thus, the Blight-Scott model can be written in state-space form. 
Harvey and Phillips (1979) described a method to put the ARMA (p,q) model, defined by: 


VES OLY fa ora hte oe Op et Cpa 0g a eres Gy (3.7) 


where the e,’s are independent N(0,07), into state-space form. The dimension of Z,AS 
r = MAX(p,q+1). Where necessary,a = (aq),..., ap) orB = (8), ..., Bg) is augmented 
with zeroes to have dimension r. The matrix, U, is set to zero. The ARMA ( D,q) model is 
equivalent to: G.6) when Hy -="(1,0,"..-*, 0), G, = (1, =B1.°.-s> —Pro1) and 
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where J,_,; the (r—1) xX (r—1) identity matrix and O’ is a row vector of zeroes. 
In this formulation, the state vector z, = (21;, ... 5 Z-)’ is defined as follows: 


Bit OV oye Oey) teovinc et OpVia(p idl) 
See AS Ce ey ie B,—1€+—(r—i) ’ 


forr—2,'35, ....f and 7}, — yrasims7) 
A necessary condition for stationarity is that Var(z,) = Var(z,_;) for all ¢. From expres- 
sion (3.6b), we see that this implies that 


Var(z) = F’Var(z)F + GVG’, 


where V, = Vis constant for all ¢. Pearlman (1980) pointed out that this can be used to obtain 
the initial conditions for z,. 

Often the survey error process can be included in the state-space model, when some struc- 
ture for the survey errors can be assumed. We have already demonstrated this for the Blight 
and Scott (1973) model. Scott and Smith (1974) and Miazaki (1985) considered a variety of 
models which were special cases of (6,) being ARMA (p,q), {e,} being ARMA (p*,qg*) and 
the scalar observations satisfying y, = 6, + e;. State-space models for this process can be 
formulated analogously to the Harvey-Phillips representation above, where the state vector 
Z,is the vector formed by concatenating the state vectors from each of the individual ARMA 
processes. 

For example, suppose {6,} is an ARMA (3,0) process with parameter (a1, a>, a3) and 
model variance o” and, {e,} isan ARMA (0,1) process with parameter 6* and model variance 
s*. An ARMA (0,1) process for {e,} would be plausible for a survey which follows Eckler’s 
two-level rotation sampling pattern, where the survey estimate for 6, is given by j,, the mean 
of all individuals reporting for the ¢-th occasion. 

This can be written in state-space form by letting 


; (3.8) 


II 
coo|/oor 


U, = Oand H; = (100/10). The first three components of the state vector correspond to 
the state-space formulation for the {0,} process and the last two components are for the {e;} 
process. 

Note that the state-space approach allows for measurement error, given by a, in (3.6a). 
However, unless the survey design has non-overlapping units with independent sampling errors, 
the measurement error terms cannot be used to model the survey error. Instead, we have 
absorbed the measurement (survey) error into the state vector. 
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From the general state-space framework, the Kalman filter equations can be derived. If, 
as in Meinhold and Singpurwalla (1983), we let the conditional distribution of z,_, given Y;_, 
be N(Z;~1)2-15 Prigeoe then recursive relationships for Z,, and Px can be constructed. 
Harvey (1984) shows these relationships are equivalent to the Kalman filter. 

The Kalman filter, in general, consists of two parts. The first is a one-step ahead prediction 
of the state vector and its covariance; the second part provides an update of the mean and 
covariance matrix of the state-space vector after the new observations are available. 

Following the notation used in (3.6), we let Y; = y; and Y¥/,, = (Y/, y/,,)’, then the 
one-step ahead prediction has a mean and variance given by 


E(Z1) = Z1j0 (3.9a) 

Var(Z eh to (3.9b) 

E(z;|¥;-1) = Rt\t-1 = F, Rp AG) (3.9) 

Var (z[Yi21) = Pi = FPA + GV,GP: (3.9d) 


The update of the mean and variance for the state vector at time ¢ after the observation at 
time t becomes available is: 


E(z,|¥;) = Zit 


= Rt\t—1 Aa PiiH, (H, Piya, +O Da ps Af 241-1) (3.10a) 
Var (z,|¥;) = Be = Pit = Pili (Hi Pp, Hy sh: Uy) a Ai Pret (3.10b) 


The equations (3.9) and (3.10) are the well-known Kalman filter equations. The formula- 
tion followed here is essentially Bayesian; however, it is possible to derive equivalent results 
using orthogonal projections; see Young (1984). 

The simplification in the computations due to the Kalman filter formulation in the sample 
survey setting can be seen by comparing equations (3.9) and (3.10) with R.G. Jones’ (1980) 
result (3.5). Note that Jones’ result required the inversion of a matrix with dimensionality given 
by the complete vector of survey estimates. 

The Kalman filter can also be used to obtain smoothed estimates given by E(z,| Y,) for 
T > t. Details of this backcasting may be found in Harvey (1984). 


Remarks 


1. Although the Kalman filter assumes an infinite population model, when the sample survey 
is based on a large sample, the central limit theorem often allows the survey errors to be 
approximately normally distributed. As well, since the smoothed estimators for {0,} are 
the same as those obtained by R.G. Jones (1980) in (3.5a), these are the linear MMSE 
estimators even when the normality assumptions are dropped. 
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2. Missing time points can be incorporated in the state-space approach. If y, is missing at time 
t, then the updating equations analagous to (3.9) become Z,, = 2%; and Py, = Py,_1 as 
in R.H. Jones (1980). However, smoothed estimates for the missing time points will depend 
strongly on the model selected, since no survey estimate is available. Therefore, the risks 
of model misspecification here are high. 


3. The likelihood function, which we discuss in Section 4 for obtaining the maximum likelihood 
estimates of the unknown parameters, can also be obtained when some data are missing, 
using the same approach given by R.H. Jones (1980). However, missing data will tend to 
increase the standard errors of the parameter estimates. In our example of Section 5, we 
encounter a case with missing time points. 


4. ESTIMATION OF THE PARAMETERS IN A STATE-SPACE MODEL 


When data are generated from the ARMA model (3.7) and the parameters a, 6, and o” are 
unknown, the maximium likelihood estimates for the unknown parameters can be obtained 
using the likelihood function derived from the state-space model. This approach was suggested 
by Harvey and Phillips (1979), R.H. Jones (1980) and others. 

The usual state-space models can also be used when the input data have independent measure- 
ment errors. This is the case for our example of Section 5, where we show the effect on the 
parameter estimates when the survey errors are taken into account. 

Maximum likelihood estimation of these parameters when the data have correlated survey 
errors has not previously been studied in detail. For a model with univariate stationary obser- 
vations {y,}, Scott, Smith and Jones (1977) suggested using the estimated autocovariance 
function of the observations { y,} to estimate the parameters of the ARMA process. Here, the 
data model is y, = 6; + e;. The variances and covariances of the survey errors, {e,;} can be 
estimated using design-based methods; see, for example, Wolter (1985). 

Efficient estimation of the autocovariances of the survey errors, assuming stationarity of 
the series, is an area which has not received attention in the literature, so ad hoc methods would 
be used in practice. Future research in modelling these survey errors would be worthwhile. In 
our example in Section 5, we could assume independent survey errors, so this was not 
problematic. 

Assuming the autocovariance of {e,} is available, the autocovariance of {0,} can be 
estimated by Cov(6,, 6;_;) = Cov(), ¥+—s) — Cov(e;, e;_5). However, this method is not 
fully efficient (Smith; 1978). Moreover, this method would not incorporate non-stationary 
survey errors. 

Miazaki(1985) considered the case where {6,} isan ARMA (p,0) process. She also assumed 
{e,} to be an ARMA (0,q) process which could be estimated directly from the survey. 
Miazaki then wrote the observations {y,} as an ARMA (p,p+q) process which she estimated 
by restricted maximum likelihood methods. 

Representing non-stationarity of survey errors in the state-space representation can 
sometimes be handled through nonhomogeneous matrices for V;, the variance matrix of the 
random ‘‘shocks’’ from the transition equation (3.6b). For example, in (3.7) s* would be 
replaced by s? to allow for non-homogeneous survey errors. This approach is taken in the 
example in Section 5. 
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In general, for state-space models given by (3.5), Harvey and Phillips (1979) write the exact 
likelihood function as follows. Letting 


Pe = EQ Mi-1) = Af Zp -1 
and 
R, = Var(y,|¥;-1) = A/Py:-1H; + U;, 
the log-likelihood function for ¥; = (y{, ... , y+) is 


log f(¥,) = (1/2) L721 log|Rel — (1/2) Yi Oe — Pye)’ RO! Me — Peps). (4.1) 


The unknown parameters in (4.1) are contained in Jtjr-1 and in R,. Depending on the 
algorithm used to maximize (4.1) with respect to the unknown parameters, it may be necessary 
to compute first and second derivatives of (4.1) with respect to the unknown parameters. This 
generally involves finding derivatives of Zr}r-1 and (its ;- These can be computed 
numerically using the recursions given in (3.8) and (3.9). For example, (3.8c) yields OFi7-1 = 
(OF )Z;—12-1 + F(02,- j:-1)- The other expressions using (3.8) and (3.9) can be determined 
similarly. 

The inclusion of regression parameters into (4.1) can be accomplished by replacing y, by 
the deviation of y, from the regression line. Tam (1987) generalized this concept even further 
by considering a model where the underlying stochastic process is determined by a state-space 
model for the regression coefficients which evolve over time. 

To maximize the likelihood function (4.1) with respect to unknown parameters, an iterative 
procedure is needed. We omit details of the procedure used for the application in Section 5 
since efficient procedures are still in the development stage. 

Once having estimated the parameters, smoothed values for the state vector, Zr\7 = E(z,| 
Y,) after time T > ¢, can be obtained using the backcasting formulae given by the Kalman 
filter; see Harvey (1984). Thus, for example, if y, = 0 + e, as in (3.1), after backcasting we 
may formulate y, = by) 7 + 7, so that Bir becomes the smoothed estimate of the mean at 
time ¢ after observing Y,. 

To derive the standard error of the smoothed estimate it is necessary to account for the fact 
that the unknown parameters have been estimated from the data, particularly when the data 
series is short; see Jones (1979). Hamilton (1986) suggests doing this by Monte Carlo simula- 
tions. He generates a set of multivariate normal random variables with mean given by the max- 
imum likelihood estimates for the parameters and variance given by the inverse of the estimated 
Fisher information matrix. He then estimates E (P,, r) and Var (Z,7), where the expectation 
and variance are taken over the generated parameter values. The sum of these two components 
is the estimated covariance matrix of the estimated state vector. This method assumes that the 
sample size is large, so that the normal approximation to the sampling distribution of the param- 
eter estimates is valid. 

In the examples of Section 5, we approximate the standard deviation of the sampling errors 
of the smoothed estimates, ignoring the variation due to estimating certain model parameters. 
We then compare these with the actual root mean squared errors of the sampling distribution 
obtained from simulated data. 
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5. DATA ANALYSIS 


In this section we show the impact of the survey errors on estimates of the parameters of 
a first order autoregressive model with regression terms. In our example the survey errors are 
assumed to be independent between occasions. More complicated cases with correlated survey 
error and higher order ARMA models for the population characteristic could be handled within 
the framework we have described. We chose this example to demonstrate that the impact of 
accounting for the survey errors can be appreciable even for this relatively simple model. 

We used data from Saskatchewan respondents to the Canadian Travel Survey (CTS). The 
CTS is conducted by Statistics Canada to collect descriptive statistics on the travelling habits 
and characteristics of Canadian residents. This survey is conducted as an ‘‘add-on’’ to the 
Labour Force Survey (LFS). The LFS is a monthly rotating panel survey with six rotation 
groups. However, the CTS is conducted at most four times a year, with at least one, but possibly 
as many as three rotation groups. The rotation groups used by the CTS for the quarters when 
the CTS is conducted are chosen so that there are no overlapping panels between occasions. 

The survey errors are assumed to be independent. This is only approximately true. The LFS 
is a multi-stage survey and the primary sampling units (PSU’s) do not rotate out as quickly 
as the individual rotating panels. The same PSU’s are used on a number of occasions. Therefore, 
although the CTS sample is selected such that the panels do not overlap between occasions, 
the independence assumption is approximately true only when the correlation of the sampling 
errors between quarterly periods within the same PSU is small. This assumption was not 
verified. 

The coefficients of variation (as a percentage) were calculated using the function: 


CV = ay~®/Vnumber of rotation groups, 


where y is the survey estimate in thousands. This is the function recommended to users of the 
CTS for data on Saskatchewan residents; see Statistics Canada (1985). In this report, the 
parameters a and @ were estimated at 91.7528 and 0.353253, respectively, using a loglinear 
regression model applied to 1979 data. For the purposes of our example, these 
coefficients of variation were rounded to the nearest tenth of a percent. 

The assumed model was: 


y= 9+ &, (5.1) 
where the e,’s are independent survey errors, with e, ~ N (0,57) and 


6,= Yo + Nit + Y2Qu + 730 + vsQ3r + &, (5.2) 


where {e,} is ARMA (1,0) with parameters (a,07). The regression terms in (5.2) are, respec- 
tively, the intercept, a term representing the quarter number with ¢ taking values from — 15.5 
to 15.5 linearly in time and, finally, seasonal terms for the first three quarters of each year, where 


= | if the ¢-th observation is in the i-th quarter; 


‘© 
| 


II 


— 1 if the ¢-th observation is in the fourth quarter; 


0 otherwise; 
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Better models may be available for these data, although with such a small data set, tests 
of hypotheses against alternative models would not be very powerful. 

To obtain the maximum likelihood estimates for the unknown parameters of this model, 
it is necessary to incorporate the assumptions made about the survey errors in the estimation 
procedure. Most users of official statistics ignore this survey error and implicitly assume that 
the input data are error-free. This does not seriously affect the results when the variance of 
the survey error is small relative to the variance of the model error. 

The survey estimates and the coefficients of variation of the survey errors relative to these 
estimates are given in Tables 1 and 2. The results of the maximum likelihood estimation pro- 
cedure are displayed in Tables 3 and 4. Two estimates are given for each model. The column 
labeled ‘‘Estimate: With Sampling Error’’ uses the method incorporating the assumed error 
structure; whereas the column labeled ‘‘Estimate: Ignoring Sampling Error”’ repeats the estima- 
tion under the assumption that the survey estimate is observed without error. In both cases 
model (5.2) is assumed. 


Table 1 


Overnight Person-Trips of Saskatchewan Residents to 
Destinations within Saskatchewan! 


—_—_—_—_—_—_——«s«s«sxkere—————————— 


No. of Survey Smoothed Survey Smoothed Simulated Simulated 
Rotation Estimate Estimate C.V- Cy RMSE Bias 
Year Quarter Groups (000’s) (000’s) (%) (%) (%) (%) 

ee ee SI ee 1 Aes ee ee 
1979 Winter 1 598 611 9.6 5.9 6.9 0.1 
Spring 1 808 813 8.6 4.8 4.9 0.4 
Summer 3 1033 1103 4.6 3.0 aut 0.0 
Fall 3 678 683 38 4.3 4.5 12 
1980 Winter 1 578 608 9.7 55 5.8 0.1 
Spring 3 837 837 4.9 Sut 3.6 0.0 
Summer 1 1451 1169 7.0 ao Bh) 0.3 
Fall 1 744 724 8.9 Sel 5.9 0.8 
1981 Winter 3 631 632 5.4 4.3 5.0 -0.1 
Summer 3 1262 1172 4.2 2.9 a3 0.1 
1982 Winter 1 565 613 9.8 545 6.4 -0.4 
Spring 1 901 838 8.3 4.5 Sil 0.8 
Summer 3 1167 1147 4.4 2.9 3h 0.1 
Fall 1 721 706 9.0 | 5.6 0.2 
1984 Winter 1 585 598 9.6 5.8 6.7 -1.2 
Spring 1 788 804 8.7 4.6 52 -0.4 
Summer 3 1068 1107 4.5 2.9 3.6 -0.5 
Fall 1 711 686 9.0 5.3 6.7 0.7 
1986 Winter 1 793 630 8.7 6.2 Tal -1.3 
Spring 3 798 808 5.0 3.9 3.9 -0.4 
Summer 3 1053 1096 4.5 3.0 a -0.3 
Fall 3 650 663 5.4 4.4 4.2 0.2 


1 The Canadian Travel Survey was not conducted in the Spring and Fall Quarters of 1981 and during 1983 and 1985. 


Simulations in last two columns are based on a sample size of 100. 
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Table 2 


Overnight Person-Trips of Saskatchewan Residents to 
Destinations in Manitoba! 


No. of Survey Smoothed Survey Smoothed Simulated Simulated 

Rotation Estimate Estimate Gavi EN; RMSE Bias 

Year Quarter Groups (000’s) (000’s) (%) (%) (%) (%) 
1979 Winter 1 a8 | 34 28.6 13.4 14.1 0.5 
Spring 1 ae) 48 26.7 11.0 10.2 0.9 
Summer 3 78 80 11.4 6.6 Tell Ile3! 

Fall 3 55 48 12.9 10.1 10.8 0.6 
1980 Winter 1 24 30 29.7 13.6 14.5 0.5 
Spring 3 63 50 125 9.5 9.4 0.7 
Summer 1 86 80 19.0 6.6 6.3 0.8 

Fall 1 75 46 19.9 11.0 12.2 0.5 

1981 Winter 3 42 34 14.2 11.3 13.2 1.0 
Summer 3 79 82 11.3 5.9 Si 0.1 

1982 Winter 1 33 34 26.5 12.5 13%2 -2.8 
Spring 1 46 44 23. 10.7 10.0 1.6 
Summer 3 78 82 11.4 5.7 5.4 0.1 

Fall 1 30 42 27.6 10.9 11.4 0.3 

1984 Winter 1 36 34 Dich 13.8 16.8 -1.3 
Spring 1 48 43 23.4 11.4 its 0.1 
Summer 3 82 82 11.1 6.1 dead -0.2 

Fall 1 30 40 IM ied iid iS} 11.4 0.6 

1986 Winter 1 33 33 26.7 16.3 19.9 -0.8 
Spring 3 38 41 14.6 10.9 | -0.1 
Summer 3 90 81 10.8 vob 8.8 -0.3 

Fall 3 42 40 14.1 1qeZ 10.5 7 


! The Canadian Travel Survey was not conducted in the Spring and Fall Quarters of 1981 and during 1983 and 1985. 


Simulations in last two columns are based on a sample size of 100. 


Table 3 
Parameter Estimates for Saskatchewan to Saskatchewan Person-Trips! 


Ignoring 

Sampling With Sampling Error 
Parameter Error 

Petitiate Retiare Standard Simulated Simulated t-value 

Error RMSE Bias of Bias 

REGRESSION 
Intercept (yo) 831.4 815.0 15.6 14.4 1.8 1.29 
Linear (7;) -0.84 -0.86 moe for -0.10 -0.65 
Ist Quarter (v2) -209.6 -203.8 21.8 24.6 -3.5 -1.41 
2nd Quarter (73) -4.0 dell 2229 23.8 0.4 0.17 
3rd Quarter (y4) 340.1 316.0 Zed 23.4 -0.4 -0.18 
ARMA 
Autoregressive (a) 0.14 0.47 0.66 0.68 -0.39 -6.77 
Model Variance (0°) 7930.5 879.3 1205.6 770.0 -488.2 -8.16 


1 Simulations and f-values are based on a sample size of 100. 
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Table 4 


Parameter Estimates for Saskatchewan to Manitoba Person-Trips! 


Ignoring 

Sampling With Sampling Error 
Parameter Error 

Ha Hstimat Standard Simulated Simulated t-value 

ue Lege Error RMSE Bias of Bias 

REGRESSION 
Intercept (yo) $1.2 50.5 1.9 2.0 0.4 e57, 
Linear (y;) -0.17 -0.13 0.18 0.17 -0.04 -2.01 
Ist Quarter (72) -20.1 -17.2 3.4 5 -0.6 -1.52 
2nd Quarter (73) -5.9 -6.1 3.6 7 | -0.1 -0.32 
3rd Quarter (74) 30.7 30.8 307 357 0.0 -0.07 
ARMA 
Autoregressive (a) 0.14 -0.75 0.66 0.71 0.49 7.90 
Model Variance (0°) 100.0 Py, 18.7 oe -0.3 -0.29 


1 Simulations and f-values are based on a sample size of 100. 


The estimates of the regression parameters are essentially the same under either assump- 
tion. However, the autoregressive component estimates differ considerably under the two 
assumptions. In particular, the model variance increases substantially. This variance estimate 
increases because the variation due to survey error is missing from the model. The reason that 
the estimates of the regression coefficients are not affected is that the estimators for these coef- 
ficients remain unbiased, although they are somewhat inefficient. 

Once the parameters of the model have been estimated, it is possible to use the assumed 
model to adjust the individual estimates of the number of overnight person-trips. The results 
discussed below demonstrate how the procedure reduces the coefficients of variation for these 
smoothed estimates when the model assumptions are correct. Such a procedure is analogous 
to model-dependent small area estimation methods. 

The smoothed estimates and their coefficients of variation are given in Tables 1 and 2. These 
coefficients of variation are calculated, taking into account the sampling error of the regres- 
sion coefficients, yo, ...., v4. This is possible since, given a and o”, the smoothed estimates 
are linear functions of the original survey estimates, so that the variances can be computed 
from this linear function and the assumed model variance of the regression residuals. How- 
ever, the sampling errors for the estimated a and o” were ignored at this point. The effect of 
ignoring these sampling errors is discussed below. 

The smoothed estimates for travel within Saskatchewan are generally close to the original 
survey estimates, with possible exceptions for the Summer of 1980 and the Winter of 1986. 
Those for travel to Manitoba are also close, with a possible exception being the Fall of 1980. 
These exceptional cases could possibly be outliers or could be due to a special event that boosted 
tourism in those quarters. In general, such phenomena could be incorporated into the model 
by: (i) increasing the model variance in the state-space model for those periods or adding 
appropriate dummy variables for special events or (ii) increasing the sampling variance for 
outliers. A more in-depth knowledge of the circumstances would be required to decide whether 
such adjustments are appropriate. The analysis here can help pinpoint possible unusual cases. 
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Because the analysis so far has ignored the effect of the sampling error associated with 
estimating a and o*, we performed a simulation study to assess its seriousness. Jones (1979), 
Hamilton (1986) and Tam (1987) have suggested that these sampling errors should not be 
ignored, especially when the time series has few observations. 

For the simulation, we generated sets of random data following the assumed model given 
by (5.1) and (5.2). We took as our parameter values the maximum likelihood estimates of the 
model. The same missing data pattern was used in the simulations as in the original data set. 
One hundred such data sets were generated for each model. In Tables 1 and 2, we report the 
percentage bias of the smoothed values and the percentage root mean squared error for the 
difference between the smoothed values and the true values based on these simulations. 

To assess whether 100 was a sufficiently large number of simulations to estimate the root 
mean squared error (RMSE), we computed an estimate of the coefficient variation of the 
estimator of the RMSE. From the simulations we obtained an unbiased estimate of the variance 
of the estimator of the mean squared error. We then used Taylor linearization to estimate the 
variance of the estimator of the RMSE. The estimated coefficients of variation ranged from 
6% to 11% for destinations within Saskatchewan and from 5% to 9% for destinations in 
Manitoba. Therefore, these estimates of the RMSE’s do provide a reasonable assessment of 
the effect of ignoring the sampling error of the autoregressive parameters. 

In Tables 1 and 2, the biases of the adjustment procedure are all small and, in fact, for the 
two sets of 22 observations only four were significant at the 5% level using a standard (-test. 

We also note that the percentage root mean squared errors based on the imulations tend 
to be larger than those under the column entitled ‘‘Smoothed C.V.’’. This is to be expected 
since the simulations include sampling errors arising from the estimation of a and 67. How- 
ever, the values of the ‘‘Smoothed C.V.’s’’ do give reasonable approximations to the simulated 
values, so the procedure which ignores the effect of the sampling error of a and 6” does not 
seriously affect the coefficients of variation. 

In Table 3 and 4, we report some simulation results for the estimated parameters. For the 
regression coefficients, only one of the biases was significant at the 5% level. The standard 
errors are all consistent with the simulation results. 

On the other hand, the simulations did point out a problem with the estimates for a and 
ao”. The biases for the estimates of a were highly significant. As can be seen from Tables 3 and 
4, one of the biases of o” was also highly significant. The simulated root mean squared errors 
were not very close to the asymptotic approximation of the standard error obtained by inver- 
ting the Fisher information matrix. It seems that the sample size for our problem is not suffi- 
ciently large for the asymptotic approximations to be very accurate. This is acommon problem 
for time series analyses of short series. 


6. CONCLUSION 


In cases where the variances of the survey errors are small relative to the variances of the 
model errors, the smoothed estimates would be close to the minimum variance linear unbiased 
estimates and there would be no appreciable reduction in the standard errors of the estimates, 
even when the assumed model is true. However, for cases such as small domain estimation 
where the sampling errors are not small, the standard errors for the smoothed estimates may 
be substantially smaller than those for the original survey estimates. For example, the smoothed 
estimates for the Saskatchewan-to-Manitoba data showed a greater improvement than the 
Saskatchewan-to-Saskatchewan data, since the sampling errors for the survey data were larger 
for the former data set. 
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One of the implications of assuming models for repeated surveys is that if the models are 
misspecified, the MMSE estimators may be seriously biased. It is important, therefore, to 
choose a model which is both consistent with the data and which reflects subject matter knowl- 
edge about the underlying phenomena. In our example the data set is small, so that a large 
number of statistical models would be consistent with the data. 

Our simulation studies suggest that even for small data sets, the asymptotic approximations 
to the variances of the smoothed estimates are quite reasonable. However, as in the case of 
more traditional applications of time series analyses, the asymptotic approximations for the 
sampling errors of the parameter estimates may be poor. 
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Sample Allocation in Multivariate Surveys 


JAMES BETHEL! 


ABSTRACT 


The optimum allocation to strata for multipurpose surveys is often solved in practice by establishing linear 
variance constraints and then using convex programming to minimize the survey cost. Using the Kuhn- 
Tucker theorem, this paper gives an expression for the resulting optimum allocation in terms of Lagrangian 
multipliers. Using this representation, the partial derivative of the cost function with respect to the k-th 
variance constraint is found to be —2afg(x*)/v,, where g(x*) is the cost of the optimum allocation 
and where af and v, are, respectively, the k-th normalized Lagrangian multiplier and the upper bound 
on the precision of the k-th variable. Finally, a simple computing algorithm is presented and its convergence 
properties are discussed. The use of these results in sample design is demonstrated with data from a survey 
of commercial establishments. 


KEY WORDS: Multiple objective sample allocation; Nonlinear programming; Stratified sampling. 


1. INTRODUCTION 


The problem of optimum sample allocation in surveys with multiple study objectives was 
first discussed by Neyman (1934) in his development of the theory for solving the univariate 
optimum allocation problem. Since then, many researchers have studied the multivariate 
problem and several approaches have been suggested, most of which fall into one of two cate- 
gories. The first involves forming a weighted average of the stratum variances and finding the 
optimal allocation for the ‘‘average variance’’ which results. Dalenius (1953), Yates (1960), 
Folks and Antle (1965), Hartley (1965), and Kish (1976) discuss methods related to this 
approach. The second basic technique is to require that each variance satisfy an inequality con- 
straint and then use convex programming to obtain the least cost allocation which satisfies all 
the constraints. Dalenius (1957), Yates (1960), Kokan (1963), Hartley (1965), Kokan and Khan 
(1967), Chatterjee (1968,1972), Huddleston, Claypool, and Hocking (1970), Bethel (1985), and 
Chromy (1987) all discuss the use of convex programming in relation to the multivariate optimal 
allocation problem. Each approach has its advantages and disadvantages. The ‘‘weighted 
average’’ method is computationally simple, intuitively appealing, and can be solved under 
a fixed cost assumption, but the choice of the weights is arbitrary and the optimality properties 
are not clear. The ‘‘convex programming’’ approach gives the optimal solution to the defined 
problem but the resulting cost may not be acceptable so that a further search is usually required 
for an optimal solution which falls within the budgetary constraints. 

In this paper, a closed expression for the optimal allocation subject to linear inequality con- 
straints will be given in terms of Lagrangian multipliers. In this framework, two results easily 
follow which substantially overcome the disadvantages of the convex programming approach. 
The first is that scaling the optimal multivariate allocation results in an allocation which is 
optimal under constraints which are proportionate to the original ones. Thus, if the optimal 
solution is too costly, it can be scaled down to the allowable budget directly and the effects 
of this on the precision of sample estimates can be directly determined. The second result is 
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a simple expression for the partial derivatives of the cost of the sampling allocation with 
respect to the variance constraints. These quantities, called ‘‘shadow prices’’, show the sen- 
sitivity of the cost to variance constraints and are useful in assessing the cost effectiveness 
of the sample design. 

The problem of solving the convex optimization still remains. Much has been written on 
methods for solving programming problems of this type and there are many software packages 
available for doing so. Some special programming considerations will be discussed here, how- 
ever, and a simple method will be presented. This algorithm, essentially a steepest descent 
procedure, is convergent, straightforward to program, and easy to use, since no initial values 
are required. An example will be presented which demonstrates this algorithm and the other 
techniques discussed above. 


2. THE ALLOCATION MODEL 


Consider the case of stratified random sampling with / strata and J variables. Suppose it 
is required that the j-th variable satisfy 


It 
Var(y;) = 2 W? Si-/n; = v?, (1) 
ra 


where S?, 7p, aud W?, are, respectively, the variance of the j-th response variable, the sample 
allocation, and the proportion of the population that fall in the i-th stratum, and where v; 
is an arbitrary, positive constant. In this paper it will be assumed that the finite Sopiilation 
correction factors are negligible. In practice, it is expected that the effects of this assump- 


tion, which will be discussed in more detail in Section 7, would be limited. 
et 


xi = 1/n; if n; SS 
= co otherwise 


and assume the cost function 
I 
g(x) = pagers Cuan — leo el (2) 
i=1 
A constant term for fixed costs could be included, but this would not affect the minimiza- 
tion process and is deleted here to simplify the notation. Define the constants 


ay, = wi S2/v? (3) 


which will be referred to as ‘‘standardized precision units’’. Notice that aj; = 0. Using this 
notation, the optimal allocation problem can be expressed as follows: 


Minimize g(x) 
subject to Ap wast B Yih ait HOW Agee conpaey (4) 
x >0 


where a; is the j-th column vector of the matrix A = fai). 
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Kokan (1963) discusses this allocation model extensively and shows how it can be adapted 
to cover many common sample allocation problems, including cluster sampling and double 
sampling. Kokan and Khan (1967) give further analytical results in this context; Arthanari and 
Dodge (1981) restate Kokan and Khan’s results. In related work, Kish (1976) describes a class 
of ‘“‘linear forms’’ which occur frequently in survey research and to which many of the results 
developed here will apply. 


3. THE OPTIMUM ALLOCATION 


The optimum allocation for a single variable is well known. In that case J = 1, and the 
minimum of g(x) subject to ajx < 1 with x > 0, denoted by x*, is given by 


VM 
RP CF (va ye eva) ICUS 0, L2P sa 


k=1 


=O otherwise. 


In this section, formula (5) will be extended to the situation where J > 1. 

The function g in (2) is strictly convex for x > 0, and the constraints given by (4) are linear, 
so that the basic results in convex programming apply here without difficulty. That an optimal 
solution always exists was demonstrated by Kokan and Khan (1967). As above, denote the 
optimal solution by x*. It follows from the Kuhn-Tucker Theorem (1951) that there exist 
A; = O such that 

J 
Va(x*) + YY dja; = 0 (6) 


y= 


forj = 1,2, ..., J. Ifx > Osatisfies L/_,ja;x < L4_, dj, then, combining (6) and (7), 


(V denotes the gradient) and 


A 


J J J 
—x’ Ve(x*) = > hj ajx < Me ee ys hj ajx* = —x*’ Ve(x*). (8) 
a j=l jai 


By convexity, g(x) — g(x*) = (x — x*)’ V g(x*) (for all x > 0 with x* > 0). Thus, from 
(8) 


g(x) — e(x*) = (x = x*y)’ Vig (x*) = 0. 


It follows that x* is the minimum of g(x) subject to the conditions 


J 


J J 
yy jax < NS d; for all x > 0. 
jz ed 


Since the minimization of g is unaffected by positive multiplicative constants, x* also mini- 
mizes g(x) subject to the constraints that ) a | afa/x < land x > 0, where af = d;/Y et Ny: 
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The extension of formula (5) to an expression for the optimum multivariate allocation now 
consists of applying the former to the weighted sum Die oraz 


J I i J 
xt = Ve;/ > chai leg \) fay if Sy yj ty > Osha odes 


j=l k=1 j=l j=l (9) 


= 00 otherwise. 


Notice that since x* minimizes g(x) subject to ajx < 1, withx > Oforl <j < J, it 
follows that mx* minimizes g(mx) subject to the constraints a/(mx) < m, with x > 0 for 1 
< j < J. Thus, as noted earlier, constraints on variances (or CV’s) can be scaled by a factor 
m (or Vm) if survey costs are too high. 

Formula (9), of course, is computationally useful only if the af are known. However, this 
formula is useful for deriving the shadow prices and for developing an algorithm for obtaining 
x* and the af. 


4. SENSITIVITY OF SURVEY COST TO VARIANCE CONSTRAINTS 


In many optimization problems, it is useful to know how the optimal solution behaves when 
the constraints are perturbed slightly. This can be especially true in survey research, where trade- 
offs between costs, survey operations and precision requirements are frequently required. In 
any case, the ‘‘shadow prices’’, given by 0g(x*)/dv,, are useful in detecting small shifts in the 
variance constraints which could substantially reduce the overall survey cost. 

Combining (2), (3), and (9), it is easily seen that the cost of the optimum allocation is 


J 2 


J I 
BCA) | Be ices Gof ayvel Peal weniG ys, a7 Wy Say Vet DD 
l i=1 


vet 


dg (x* : z “= cj; W7S?,/ vi 11 
an =| ee a Sa ee 2 


ov 
; ss te a z * 2 
: 29¢./y4 
c, Y) a W2Sii/v; 
a! 


I J I J 
ag 
pence. aig ci/ ye ota Vf Lo otae; 
f i=] j=l k=1 j=l 
a 
= 208 oct) af xt. 
Vk 
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From (7) it follows necessarily that af = 0 whenever ajx* < 1, so that 


dg (x*) a 2% g(x). (12) 
OV; Vk 


This formula is somewhat more complicated that the usual expression for shadow prices (e.g., 
see Luenberger 1984), due to the complex relationship between g and Vy. 


Now consider increasing v, by (1007) %, 0 < a < 1. Denote by x* + Ax* the resulting 
perturbation in x*. By (12), 


0 
g(x* + Ax*) — g(x*) = me = (2 arak o(x*): (13) 
k 


Thus an increase of (1007)% in the k-th variance constraint results in a (100)(27a~) % 
reduction in the overall survey cost. 


5. PROGRAMMING CONSIDERATIONS 
This section discusses some technical aspects of solving for x* and gives a simple algorithm 


for finding both x* and the coefficients a*by searching over weighted averages Hye | jG). 
Define OF by 


Oifi xj. 


For a vector a = (a), Q, ..., ay)’, define X(a) by 


df I df Af 
Ve; ip > Aj aij e [x jak; if De Qj Qi; pet Oe Py ee A 
j=1 k=1 1 


j= j=l 


Xi (a) 


='Oo otherwise. 


Notice that ¥(a*) = x*. Now the iterative algorithm for finding x* is defined as follows: 


is Dake i=, /,"li<of nt: 


2. At step n = 2, find an index k for which 
(a,—a;)' F(a) SO 1 sys J. (14) 


This gives the constraint which the current optimum solution violates by the largest margin. 
If ajX(a") < 1, then terminate the algorithm. Otherwise, find ¢ € (0,1) for which 


e(x(t\ 5. + 1 — t™)a™) = o(X(t 5, + (1 — 1) ae™)) for all t¢,[0,4],.. 5) 
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3, Take ae RSE oy ee nee 


4. Terminate when | aj{"*!) — a{”)| < €, 1<j<J, where € is a predetermined con- 
vergence criterion. 


To verify the convergence of the algorithm, first note that ¥(a@) minimizes g(x) subject 
to Y 4-10; ax = 1. Thes?smce Y 710; GNre ss Y 7210; = 1, 


Ors 2K) eaee (x5) (16) 
for all n. Furthermore, from (15), g(%(a‘”)) ) is nondecreasing, implying the convergence of 


g(X(a"”)). To see that X(a‘”) — x*, first define 


I J 
Myal(t) = Y) Ale; Vy (tO + U—toyay = Ve(% (to, + (1 — t)a)). (17) 
i=] 


USS 


Since h,,,(t) is concave (i.e., —h,,,(t) is convex), 


Aya (t) — hyg(0) = th’ (0) + O(t?) (18) 


J 
; y (84; — aj) acy 
SA AND eR ye) 


i=1 ih 
p) Cj > Oj Ajj 
j=! 


(¢/2)Vg(¥(a)) (aZ¥(a) — 1) + O(t7). 


By allowing ¢ to tend toward zero, it follows that there exists ¢ € (0,1) for which 
VE(X(td, + (1 — tha)) = Aya (t) > Myg (0) = Ve(X(a)) 


if and only if a,X¥(a) > 1. Thus it follows from (15) that the constraints are satisfied at 


convergence; combining this with (16) implies that lim %X(a‘”) = x*. 
n— oo 


In carrying out the algorithm, Step 2 requires a search for ¢‘”). Define h,,,(t) as in 
(17). It is clear from the preceding discussion that ajX(t6, + (1 — tha”) = 1 when h(f) 
(and hence g) is at a maximum. Furthermore, since h,,,(f) is strictly concave, hj, (t) is nonin- 
creasing in ¢ and thus the point where Aj,(t) = 0 is unique. It follows that a binary search 
for the point where h,,(t) is maximized can be implemented by simply checking to see 
whether aj%(t6, + (1 — t)a™) = 1, providing a rapid means of obtaining a close approx- 
imation for ¢‘”). 

As described above, the algorithm takes a, as the initial value. This is completely 
arbitrary, since any of the aj, 1 < J = J, would do. In practice, the constraint for which the 
optimum allocation (i.e., formula (5)) yields the highest cost is generally a good choice for 
the starting value. 
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Notice that Step 2 of the algorithm will require // calculations in formula (14) and a 10- 
step (say) binary search of 3J + J + 1 calculations each in formula (15), while J calculations 
must be carried out in Step 3. Thus each iteration of the algorithm is O(//). From (18), at 
the n-th interation, 


1 
hiq(0) = Sy) (apx(a"Qae 1) 


so that ajX(a”)) is approximately proportionate to hj, (0) (up to an additive constant). 
Heuristically, z,,(0) is the ‘‘slope’’ of 4 in the direction of a,, suggesting that the algorithm 
is essentially a steepest descent (or ascent, in this case) procedure. This, in turn, suggests a linear 
rate of convergence (see, for example, Forsyth 1968). 

In the author’s experience (see Bethel 1985), the algorithm converges quickly for most 
moderately sized problems. For example, sample allocation problems with 20-30 strata and 
5-10 constraints were solved in 3-5 seconds using the algorithm (on a Compaq 38620 with a 
30387 math co-processor) versus 6-8 seconds using a sequential unconstrained minimization 
technique (SUMT) implementing a penalized steepest descent algorithm. Run times vary con- 
siderably depending on the magnitude of the problem, the number of active constraints, and, 
obviously, machine characteristics. The author’s computing experience (with problems of 20- 
30 strata and 5-10 constraints) includes the Macintosh SE (30 seconds to 2 or 3 minutes), Leading 
Edge Model D (1 to 5 minutes), Zilog System 8000 (5 to 60 seconds), and the Compaq men- 
tioned above (5 to 10 seconds). However, the run times are generally insignificant in comparison 
with the labor involved in creating files and other preparatory tasks. In particular, it may take 
several hours to find an acceptable starting value for the SUMT algorithm. Thus a strong feature 
of the algorithm described in Steps 1-4 above is that it requires no external initial values. More- 
over, it is relatively easy to program, requiring only 40 or S50 lines of code. 

An even simpler algorithm is given by Chromy (1987). It can be adapted to our notation 


and general approach as follows: Set a{’) = 1/J, and, for n = 2, let 
J 
Cp OSG DG (yyy ie CEES (hy ae ee ees (19) 


Like the algorithm described in steps 1-4 above, (19) requires no external initial values; (19), 
however, requires even less programming effort and, based on several comparisons, it appears 
to converge considerably more quickly. Unfortunately, there is apparently no formal proof 
of convergence, although considerable practical experience (see Chromy 1987 for a more 
detailed discussion) suggests that it has good convergence properties. 


6. EXAMPLE 


Tables 1-3 present an example drawn from a survey of commercial establishments. (Only 
the strata for educational institutions are shown here.) Four of the primary variables of interest 
are given: area of enclosed floorspace, age of building, number of full-time employees, and 
percent of buildings heated by oil. Table 1 gives the stratum level variance information. Here 
the standardized precision units are computed as 


202 
wa tos 
Y Pry2 


a 
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Table 1. 


Allocation Example: Survey of Educational Institutions. 
———— ee er ee ee ae 
Stratum Standard Deviation 


Weight Floorspace Age Employees Pct. Oil Heating 
Stratum 
1 .5158 22 SA 9etA 43.71 Doz 48.15 
i .2632 24,056.21 16.68 27.09 36.79 
3 .1184 54,201.75 24.70 | ged 48.04 
4 .0711 155,514.21 16.01 59.46 38.07 
5 .0184 1255239.21 14.74 S27, 48.80 
6 .0132 355,392.69 20.90 212s 57.74 
I ee SE pees ee ree 
Mean: 54,641.85 43.03 45.23 67.58 
Vk: .06 .06 .06 .06 
Standardized Precision Units 
wa es a ee 
Floorspace Age Employees Pct. Oil Heating 
— eS I © BASU eee ahs § SAVER Oe 
Stratum 
1 1233 76.24 23.90 SVeS2 
2 3473 2.89 6.93 5.70 
3 3.83 1.28 56 1.96 
4 11.36 19 2.44 .45 
5) ess 01 whe .05 
6 2.03 01 1.06 .04 
re ae Ra Sa ye ee ee ee Se 
Required 
Sample Size: ppap 149 i Weild 121 


where v; = .06 for all variables (so that the half-width of a 90% confidence interval will be 
approximately 10% of the mean). Also given are the sample sizes required for Neyman alloca- 
tion for each of the variables taken individually. Survey costs are assumed to be constant across 
strata. 

Table 2 gives the first-pass solution, which requires a sample of 241 units. The normalized 
Lagrangian coefficients and the achieved precision levels are given, from which it is apparent 
that floorspace and building age are dominating the solution while the other variables are not 
“‘active’’. Here the starting value w‘') = (1,0,0,0) was used; because the third and fourth con- 
straints were always satisfied, there was only one iteration with a 9-step binary search for ¢"!). 
(The successive estimates for the optimal ¢ were 1/2, 1/4, 3/8, 5/16, 11/32, 21/64, 43/128, 
85/256, and 171/512.) Also given in Table 2 are the 10% shadow prices: 10% increases in the 
first (or second) constraints would result in a sample size reduction of approximately 32 (or 
16) units. Since the third and fourth constraints are not active in the solution, changing their 
CV requirements would have no effect on the allocation or the sampling costs. 

Table 3 gives a second pass solution under the requirement that the total sample size is no 
larger than 200. The optimal solutions are thus scaled by 241/200 (so that the optimal alloca- 
tion goes down by 200/241) and the resulting CV’s are scaled by V241/200. The new 10% 
shadow prices are -27 and -13 for the first and second constraints, reflecting the decrease in 
the overall survey cost. Notice that there is approximately a 10% increase in the CV’s (from 
the original ones in Table 1), so that the sample reduction of 48 predicted by the shadow prices 
in Table 2 compares favorably with the actual 41 unit reduction. (The shadow price predic- 
tions will always be somewhat optimistic due to the linear approximation.) 
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Table 2. 
Allocation Example: First Pass Optimum Solution. 
Optimum 
* q.. o* 
piiotin Yo” aii *i Allocation 
1 33.6749 .O111 90 
2 3.4495 .0347 29 
3 2.9783 .0373 27 
4 7.6294 .0233 43 
5 4.9119 .0291 34 
6 1.3554 .0553 18 
Total: 241 
Floorspace Age Employees Pct. Oil Heating 
Lagrangian 
Multiplier .6660 .3340 .0000 .0000 
(Normalized): 
Achieved 
Precision: .0600 .0600 .0481 .0502 
10% Shadow 
Prices: -32 -16 0 0 


Table 3. 
Allocation Example: Optimum Solution for Sample Size Limited to 200. 
Optimum 
* q.. Rs 
panera Yaj® Gj Mi Allocation 
1 33.6749 .0134 75 
2 3.4495 .0418 24 
3 2.9783 .0449 22 
4 7.6294 .0281 36 
5 4.9119 .0351 29 
6 1.3554 .0666 15 
Total: 201 
Floorspace Age Employees Pct. Oil Heating 
Lagrangian 
Multiplier .6660 .3340 .0000 .0000 
(Normalized): 
Achieved 
Precision: .0657 .0658 .0528 OSS 


10% Shadow 
Prices: =27 -13 0 0 
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7. DISCUSSION 


In this paper we have given a formal representation for the optimal sample allocation for 
a multipurpose survey with linear variance constraints, and derived expressions for the par- 
tial derivatives of the cost function with respect to the precision constraints. The latter result, 
in particular, provides approximations that are useful in survey planning, permitting a great 
deal of exploratory work without exact computer calculations. 

Throughout the paper, the normalized Lagrangian multipliers, a }, play a key role. In par- 
ticular, we have noted that whenever the j-th variance constraint is not ‘‘active’’ in the solu- 
tion to the allocation problem, the j-th Lagrangian ay = 0. 

The optimization approach discussed in this article yields a continuous solution, which 
must then be rounded in some way to provide integer stratum sample sizes. Clearly this 
rounding will cause some deviation from optimality. However, the objective function here 
is generally considered to be rather insensitive to small deviations from optimality (see 
Cochran 1977), so that exact integer solutions are probably not cost effective. In fact, it seems 
likely that round-off error would be insignificant in comparison with the sampling errors in 
estimates of means and variances that would normally be available for developing an optimized 
survey design. 

Finally the reader will recall that finite population correction factors have been ignored 
throughout this paper. It is easy to include these in the allocation model by manipulating equa- 
tions (1) and (3), although that would cause equation (13) to be somewhat imprecise. How- 
ever, it should be kept in mind that even when the FPC is non-negligible for some of the strata, 
the overall effect usually is negligible. In any case, the FPC term, Y/_, W? S?/Ni, can 
always be calculated in order to evaluate the situation and, if necessary, it can be added to Ve 
in formula (13) to obtain exact results. 
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The Role of Demographic Factors in the Analysis of Survey 
Versus Diary Purchase Reporting Accuracy 


EDWARD R. BRUNING and MICHAEL Y. HU! 


ABSTRACT 


In this article the authors evaluate the relative performance of survey and diary data collection methods 
in the context of the long-distance telephone communication market. Based on an analysis of 1,530 
respondents, the results indicate that two demographic variables, sex and income, are important in 
explaining the difference in survey reporting and diary recording of usage data. 


KEY WORDS: Survey; Diary; Data collection. 


1. INTRODUCTION 


A perusal of the marketing literature underscores our lack of knowledge regarding the relative 
accuracy of survey and diary methods for collecting consumer expenditure data. Clearly, the 
resolution of this issue has ramifications for researchers as well as those for whom the research 
is conducted. Wind and Lerner (1979) stress the need to appropriately evaluate the two methods 
and to identify the characteristics of those reporting purchase behavior accurately versus those 
that have a high discrepancy between reported and actual consumption. To be sure, an analysis 
of the discrepancy focuses attention on the data collection instrument, for the choice of instru- 
ment could affect management decisions relating to ‘‘product positioning and market segmen- 
tation strategies, advertising media and copy research, and concept/product testing.’? (Wind 
and Lerner 1979). 

The purpose of our article is to assess empirically the relationship between several 
demographic variables and the two expenditure reporting methods from a single sample of 
respondents in the U.S. long-distance telephone market. We present additional evidence on 
the issue initially posed by Wind and Lerner. First, the current state of knowledge regarding 
the nature of the two instruments is surveyed. Then the research methodology is described and 
findings from the long-distance telephone market are reported. We conclude with a number 
of implications relevant to both providers and users of consumer expenditure data. 


2. LITERATURE REVIEW 


The two prominent methods for recording household consumption expenditures are survey 
(recall) methods, whereby household members are asked to recall expenditures made during 
a predefined period, and the diary method, whereby a daily or weekly log is maintained which 
identifies specific expenditures. Neter (1970) provides case examples and empirical studies which 
address the relative advantages and disadvantages of the two expenditure collection devices 
but do not compare their relative accuracies. In general, the survey approach possesses advan- 
tages in economy while simultaneously possessing a number of disadvantages relative to the 
diary method. Because of time and resource constraints, most researchers utilize the survey 
method even with the multitude of measurement problems. 


1 Rdward R. Bruning and Michael Y. Hu, Graduate School of Management, Kent State University, Kent, Ohio, 44242. 
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It is commonly believed that diary methods have advantages over survey approaches prin- 
cipally because diarists have the opportunity to record the event within a short period after 
it has occurred. For this reason, Sudman and Ferber (1971) have all but discredited the survey 
approach for collecting expenditure data and have suggested the exclusive use of diaries. But 
the diary method is not problem free. The authors evaluated households in the Chicago area 
in 1972 and found evidence of underreporting by the survey method with respect to the number 
of purchases. They also found that respondents had difficulties in separating purchases into 
specific item categories with the survey recall method. 

A number of writers report that the diary approach is appropriate only for certain expendi- 
ture categories (Pearl 1968; Grooteart 1986; Wind and Lerner 1979; Stanton and Tucci 1982). 
Pearl (1968) has stated that individual diaries are to be preferred because of reporting thorough- 
ness. For large ticket items the method is preferred; however, reporting frequency declines for 
small valued purchases. Grooteart (1986) adds to this prescription by suggesting that all eligible 
household members keep diaries to reduce omissions in expenditure reporting. Wind and Lerner 
(1979) and Stanton and Tucci (1982), in separate studies on expenditure reporting for specific 
food items, substantiate the superiority of the panel method relative to surveys. 

The construction and design of the diary instrument poses collection problems (Kemsley 
1961; Kemsley and Nicholson 1960; Lewis 1948; Sudman 1964a, b; Sudman and Ferber 1971; 
Walsh 1977). Kemsley (1961) and Kemsley and Nicholson (1960) evaluated record books kept 
on consumer expenditures over a three week period in 1953. They found that significant varia- 
tions occurred in expenditure recording over the three week period by type of expenditure and 
by season of the year. Lewis (1948) evaluated the accuracy of weekly versus monthly diary recor- 
ding of grocery and clothing expenditures. The author found a 16% reduction in monthly 
reporting in comparison to weekly expenditure reporting. Sudman (1964a) and Sudman and 
Ferber (1974) studied alternative means of obtaining consumer expenditure data. They evaluated 
the role of compensation, training of respondents, and method of reporting. In the studies 
they conducted, compensation was significant in improving respondent cooperation and 
accuracy, and direct training aided in respondent reporting accuracy. The frequency of pur- 
chase and the construction of the reporting form were also important in reporting accuracy. 

Other studies have focused more explicitly on consuming unit cooperation. (Kemsley and 
Nicholson 1960; Pearl 1968; Sudman and Ferber 1974). Kemsley and Nicholson (1960) report 
that the size of the individual purchase has a significant effect upon the degree to which 
respondents cooperate in reporting expenditures. Pearl (1968) and Sudman and Ferber (1974) 
emphasize the incentive payments in terms of amount and duration in generating cooperative 
expenditure reporting. 

An additional concern with the diary method is the extent of panel mortality (Sandage 1956; 
Sodol 1959; Sudman 1964a, b) and panel decay (McKenzie 1983; Sandage 1956; Sodol 1959; 
Sudman 1964 a, b). Sandage (1956) investigated whether consumer panels develop bias as a 
result of being interviewed. Based on three separate investigations on Indiana farm households 
over the period 1947-1954, the author found that bias was not a significant concern with panel 
collection methods. Sudman (1964a, b), however, found mortality tended to be greater for 
male respondents. In addition, the degree of effort involved in recording appeared to have no 
impact on accuracy or mortality rates for respondents involved in panel recording of expen- 
ditures. In terms of panel decay, McKenzie (1983) reported that greater attrition occurred with 
longer panel periods while Sandage (1956) found that repeated use of a given panel did not 
result in a bias in reporting accuracy. 

Parfitt (1967) argues that housewives in surveys recall accurately only purchases for fre- 
quently bought products in the most recent past. Thus, the diary recording of past purchases 
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yields a more reliable and accurate measure than survey reporting. In surveys, respondents 
typically are asked to report purchases over a long time period or to engage in a mental averaging 
exercise to arrive at an expenditure figure for a typical week or month. As a consequence, Parfitt 
(1967) concludes that a strong likelihood exists for respondents to exaggerate the amount and 
frequency of purchases and to oversimplify the complexity of the expenditure decision. 

As indicated in an earlier section of this paper, our research focuses on the accuracy of survey 
versus diary purchase reporting. Only a few articles address this issue empirically. Wind and 
Lerner (1979) analyze the validity of survey versus diary approaches in accounting for con- 
sumer expenditures. Their data are taken from a sample of 450 housewives serving ona MRCA 
consumer diary panel. The housewives completed a mail survey questonnaire and were 
instructed to maintain a record of their expenditures of various brands of margarine for a six 
month period. The results indicate a discrepancy in the relative accuracy of the two reporting 
methods between the aggregate and the individual consumer response level. At the aggregate 
level, survey and diary instruments are consistent in predicting the rank-ordering of brand 
market shares. Major discrepancies are detected, however, at the consumer level as survey 
responses are less accurate as compared to diary reporting. The authors attribute this inac- 
curacy as resulting from ignorance, forgetfulness, poor survey questioning, reporting errors, 
falsification, and interviewer bias. 

Stanton and Tucci (1982), following the work reported by Wind and Lerner (1979), sample 
7,945 participants in the National Food Consumption Survey (1977-78). Personal interviews 
are used as the reporting vehicle for food expenditures which occurred in the previous twenty- 
four hour period. The participants were asked to maintain diaries of all food and beverage 
expenditures for two days following the interview. Their results indicate that, at aggregate levels, 
personal interviews provide information which is as accurate and reliable as diary reports. They 
were not able to address the relative accuracy of the two approaches at the consumer level 
because of the nature of the data. 

The apparent discrepancies in results reported by Wind and Lerner (1979) and Stanton and 
Tucci (1982) may be attributable to the differences in the time frames within which consumers 
operated in reporting expenditures. In Wind and Lerner’s study, respondents were requested to 
report the brand most often purchased. Questions of this nature require a greater amount of 
recall since the time reference is over an extended period. In Stanton and Tucci’s study, however, 
the reporting period is restricted to the previous twenty-four hours. Parfitt (1967) indicates 
that respondents are more effective in reporting recent purchases. In this light, Stanton and 
Tucci’s conclusion is not truly surprising and, furthermore, does not contradict the results of 
Wind and Lerner’s analysis since recall for both the survey and diary recording methods was high. 


3. THE STUDY 


During the years 1978 and 1979, AT&T (American Telephone and Telegraph Company) 
initiated a major data collection effort with the objective of providing information for cor- 
porate market planning and strategy formulation in its residential long-distance telephone 
market. A nationally projectable sample of roughly 4,000 households were recruited and asked 
to participate on a panel for a period of twelve months. The sample was demographically 
balanced with respect to six variables: population density, income, marital status, age, sex and 
geographical region of domicile. 

The entire panel responded to a pre-assessment survey instrument administered through the 
mail in January 1978. Once completed, each panel member was instructed to fill out a weekly 
diary over the next twelve months. In the pre-assessment phase, respondents were asked to 
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respond to the question: ‘‘During an average or typical month, how often do you communicate 
for non-business reasons with relatives and friends who reside at least 50 miles from your 
home?’’ This measure is referred to as [PERCEIVED 1]. Also, each panel member in the pre- 
assessment phase responded to the question: ‘‘Would you consider yourself a heavy, medium, 
light or non-user of long distance calling?’’ (Refer to this measure as PERCEIVED 2). So that 
comparisons could be made between panel and survey data collection methods, panel 
respondents were asked to record information on the frequencies of long-distance communica- 
tion by day of the week. This measure is referred to as REPORTED 1. 

Throughout the entire study every attempt was made to conceal the sponsor of the project. 
Moreover, the positions of the response categories were randomized in order to remove any 
possibility of position bias. A sample of 2,350 respondents was retained after twelve months 
of reporting. Panel attrition was perceived to be a potential problem in this study because attri- 
tion rates may vary substantially among demographically defined subgroups. In order to resolve 
this problem, a sample balance program was developed and used to randomly select a sub- 
sample of participants from the pool of 2,350 respondents which would be demographically 
balanced. After editing and sample balancing, 1,530 panel members who had completed the 
pre-assessment and the twelve-month diaries were used in this study. 


4. DATA ANALYSIS 


An important question in the pre-assessment survey asked the respondents to report their 
““perceived’’ usage for a typical month [PERCEIVED 1]. In order to obtain consistency in 
the unit of measurement, weekly diary recorded usage [REPORTED 1] is aggregated to twelve 
monthly totals for each respondent. Refer to the aggregated diary reported measure as 
REPORTED 2. Matched differences between ‘‘perceived’’ usage reported in the pre-assessment 
survey [PERCEIVED 1] and ‘‘actual’’ usage extracted from diaries [REPORTED 2] are 
calculated for each respondent for twelve monthly periods as well as for the average of the 
twelve months. A one-way ANOVA design is employed monthly and for the twelve month 
average to detect if significant variations exist with respect to the matched differences across 
levels of several demographic variables: sex, income, education, and age. Ana posteriori con- 
trast test is performed to compare all possible pairs of level means for each demographic 
variable. Finally, to evaluate the effects of interactions among the four demographic variables, 
a four-way ANOVA procedure is employed using the twelve-month average scores. 


5. RESULTS 


5.1 Survey and Diary Average Reported Usage 


Table 1 reports the average number of long distance telephone communications extracted 
from respondent diaries for each of the twelve months as well as the usage for a typical month 
[PERCEIVED 1] taken from the pre-assessment survey. Interestingly, this ““perceived’’ usage 
reported in the pre-assessment survey is substantially greater than actual recorded usage 
[REPORTED 2] for each month of the analysis. 

The diary averages indicate the presence of seasonality in the usage. December 1978 usage 
of 4.123 is the highest among the twelve reported months. Even though the pre-assessment 
survey requested the respondents to report usage for an average or typical month, it is quite 
likely that they would use December 1977 as the basis for response since the pre-assessment 
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survey was administered in January 1978. A one-sample t-test indicates that the average of 
the paired-difference between pre-assessment and the December diary usage, 0.235, is signifi- 
cantly different from zero (p-value = 0.001). By the same token, t-test results for the other 
eleven averages are statistically significant. These results imply that the respondents have indeed 
over-estimated in the pre-assessment survey as compared to the diary reported usage. 

A potential concern is that the reported usage in the pre-assessment survey could be 
influenced by the unusually high usage in December 1977. If so, then it is argued that the results 
of our study are subject to seasonality bias. In addressing this issue, the authors have exam- 
ined the difference between the reported usage in the pre-assessment survey and the December 
1978 diary. Comparing the same months over a year of time could help to eliminate the season- 
ality factor. As indicated in Table 1, this difference is statistically significant. This difference, 
however, can be due to the difference in the data collection method and to a trend factor since 
the comparison involves two different years. Assuming a positive trend in the usage of services 
over time, the reported usage in December 1978 should be higher than that of December 1977. 
The data from Table | indicates quite the contrary. Usage in December 1977 was significantly 
higher than that reflected in December 1978. Thus, this evidence leads us to conclude that there 
is indeed a significant difference due to the data collection method. Respondents in our study 
had over-estimated their usage in the pre-assessment survey as compared to their estimates 
reported in the diary. 

Prior to our analyzing the relationship between the difference in survey versus diary data 
collection methods and the several demographic variables, it is important to evaluate the role 
played by actual usage in explaining this difference. Our reasoning for this test is that if the 
difference between survey reporting and diary recording is due to the absolute level of usage, 
then further analysis would prove suspect since experience (learning) would tend to bias our 
dependent variable (McKenzie 1983). On the other hand, if no statistical significance is 
attributable to the differences in collection methods and absolute usage levels, then the analysis 
with the demographic variables would be of greater validity. 


Table 1 


Average Absolute Number of Long-Distance Telephone 
Communications and Pre-assessment Survey Estimates 


N f 
Month Average Absolute Number o 


Communications 
February 3.516 
March 3.878 
April 3.486 
May 3.610 
June 3.414 
July 3.604 
August 3.606 
September 3.250 
October 3.426 
November 3.518 
December 4.123 
January 3.891 
Preassessment Survey Estimate 4.358 


ie = USBY 
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Table 2 


One-Way ANOVA Results Relating the Degree of Long-Distance Telephone Usage and 
the Difference Between Survey and Diary Reporting (12 month average) 


Degree of Usage 


Heavy Medium Light Non-User P-Value 
Mean Difference 
(survey-diary) 0.762 0.799 0.795 0.580 0.9905 
n 316 605 547 45 


Table 2 reports the results of the analysis of the relationship between the difference in survey 
[PERCEIVED 1] and diary [REPORTED 2] reportings and the degree of absolute usage 
[PERCEIVED 2]. McKenzie evaluated the form of both response and recording bias involving 
the collection of telephone call details by diary methods. Response rate was found to vary with 
customer usage. Furthermore, telephone usage recording rates tended to decrease with usage 
as well. Thus, telephone call data collected by diary methods are subject to several biases. Our 
study focuses on the difference in survey versus diary collected data and customer usage where 
the emphasis lies with the discrepancy between ‘“‘perceived”’ and ‘‘actual’’ consumption/pur- 
chase and the level (degree) of usage. Even though recording biases exist with both methods, 
nonetheless, the difference between the two recordings is not related to usage. 

In addition, the validity of using PERCEIVED 2 as a categorization variable can be exam- 
ined by correlating this measure with REPORTED 2 and PERCEIVED 1. REPORTED 2 and 
PERCEIVED 1| measurements were first categorized into heavy, medium, light, and non-user 
employing different cut-off levels. Cross-tabulations were then conducted between 
PERCEIVED 1 and these two categorical measures. Significant statistical relationships were 
detected in all cases. 

Our dependent variable is the difference in the survey [PERCEIVED 1] and diary usage 
recordings [REPORTED 2] and the independent variable is the degree of usage divided into 
four levels: heavy, medium, light and non-user [PERCEIVED 2]. The results from the one- 
way ANOVA procedure using the least-squares estimation procedure indicate that the degree 
of usage is not statistically significant (p = .9905) in explaining the recorded usage difference 
between the survey and diary methods. A one sample t-test of each of the four individual group 
means showed that each mean was statistically different from zero at the 0.01 significance level. 
Therefore, the results imply that with respect to each of the four usage groups the positive mean 
values represent that respondents tend to over-estimate usage in the pre-assessment survey 
relative to the diary recording method. 


5.2 Relationship Between Survey and Diary Reported Usage Differences and Selected 
Demographic Variables 


In Table 1 we reported the existence of a substantial difference between survey and diary 
collection methods for the same respondents over a twelve month period. An interesting ques- 
tion is: what accounts for the perceptual bias in survey reporting of purchase data? To answer 
this question, a number of demographic factors are evaluated. Several levels of each factor 
are specified and a one-way ANOVA procedure is employed to account for the reporting dif- 
ferences. Tables 3 through 7 report the results of the analyses. 
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Table 3 


One-Way ANOVA Results Relating Sex of Respondent and 
the Difference Between Survey and Diary Reporting of Data 


Differences by Sex 
(Survey — Diary) 


Month ia GAN Ale) MASS) LeANOVA” WN 


Male Female p-value 
February 0.412 1.135 0.006* 
March —0.015 0.818 0.005* 
April 0.379 1.201 0.002* 
May 0.310 1.304 0.008* 
June 0.562 1.205 0.016** 
July 0.376 1.008 0.018** 
August 0.395 0.987 0:03,1** 
September 0.927 R225 0.258 
October 0.605 1.149 0.042** 
November 0.593 1.003 0.129 
December —0.112 0.464 0.041** 
January 0.164 0.675 0.0755" 
Mean@ 0.380 0.990 0.010* 
n 617 911 


4 Twelve month average. 
* Significant at the 0.01 level. 
** Significant at the 0.05 level. 


5.3 Sex 


The relationship between the difference in survey and diary recordings of usage and sex of 
the respondent is depicted in Table 3. The one-way ANOVA p-values are statistically signifi- 
cant for 9 of the 12 months at the 0.05 level or below and significant at the 0.01 level for the 
twelve month average. Thus the results indicate that both male and female respondents over- 
estimate their actual usage of long distance telephone service and that females over-estimate 
to a greater degree than do males. 


5.4 Income 


In Table 4 we present the difference between survey and diary usage reports in relation to 
respondents’ household income level. For 6 of the 12 months the one-way ANOVA p-values 
are statistically significant at the 0.05 level or better and the 12-month average is significant 
at the 0.037 level. Furthermore, the results of Tukey’s Studentized t-test indicate that respon- 
dents with annual household income in the Category 1 range ($5,000 or less) are statistically 
distinct from respondents earning incomes within the range of $10,001 to $20,000. 

An obvious anomaly in the findings reported in Table 4 is that for respondents within the 
lowest income category ($5,000 or less), estimated average monthly usage is below the actual 
monthly usage in 9 of the 12 periods. Furthermore, with increasing household income a defi- 
nite persistence to over-estimate usage occurs although this process begins to subside at the 
highest income category. At lower income levels consumers may perceive long-distance tele- 
phone service as a luxury item with respect to the other modes as well as with regard to other 
consumer expenditures. Consequently, when asked to report expected usage, as in a survey, 
respondents from this income strata tend to discount their perceived usage because of the belief 
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Table 4 


ANOVA Results Relating Respondent’s Income Level and 
the Difference Between Survey and 
Diary Reporting of Long-Distance Telephone Usage 


Differences by Income (Survey — Diary) 


0-$5,000 $5,001 — $10,001 — $15,001 — Over p-value 
10,000 15,000 20,000 20,000 
(1) (2) (3) (4) (5) 
February — 0.010 — 0.583 1.180 1.120 0.571 0.110 
March — 0.480 — 0.738 0.780 1.009 — 0.062 0.019** 
April — 0.337 0.851 1.188 1.258 0.550 0.031 
May — 0.327 0.560 0.928 0.991 0.636 0.220 
June 0.102 0.911 I OpA7/ 1.331 0.756 0.249 
July — 0.439 0.500 0.895 1.050 0.694 0.128 
August — 0.408 0.512 1.021 124335) 0.498 0:036%* 
September 0.306 0.798 1.298 1.367 0.976 0.301 
October — 0.469 0.542 e231 1.413 0.720 0.009* 
November 0.010 0.494 0.941 1.214 0.741 0.248 
December — 1.010 0.060 0.209 0.792 0.101 Or050** 
January — 0.633 — 0.339 0.654 0.956 0.392 0.030** 
Meana,b — 0.308 ODT 0.946 1.145 0.548 0.037** 
n 98 168 3y7/8 341 536 


4 Twelve month average. 
b Tukey’s Contrast Test: (1) and (4) and (1) and (3) are different at the p = 0.05 level. 


* Significant at the 0.01 level. 
** Significant at the 0.05 level. 


that limited monies should be spent elsewhere. At the actual point of consumption, however, 
relative values may have changed since the urgency of the situation may dictate a long-distance 
telephone call is indeed the low-cost option relative to alternative communication means. Thus, 
survey reporting of planned usage may deviate from diary recordings of actual usage because 
of situational factors that intervene during the time of consumption. 

As respondents’ household incomes increase, long-distance telephone use is still perceived as 
a superior good; however, whereas respondents in lower income levels perceive long-distance 
telephone use as an expendable (and perhaps frivolous) purchase, wealthier respondents 
“*expect’’ to employ the telephone more often than the other modes. Thus, when surveyed as to 
their ‘‘expected’’ usage, wealthier respondents tend to overestimate the number of long-distance 
telephone communications since in most situations it is their preferred method of communicating. 


5.5 Age 


Table 5 reports that respondents at every age level tend to over-estimate their ‘‘perceived’’ 
usage relative to ‘‘actual’’ usage as recorded in diaries. Although the one-way ANOVA p-values 
indicate the nonexistence of a significant relationship between measurement methods and age 
of the respondent; nonetheless, in 10 of the 12 months respondents less than 31 years of age 
incurred the lowest difference relative to older respondents. In addition, average differences 
for respondents between 31 and 40 and over 50 were lower than the average for the less than 


Survey Methodology, June 1989 67 


Table 5 


One-Way ANOVA Results Relating Respondent’s Age and 
the Difference Between Survey Reporting and 
Diary Recording of Long-Distance Telephone Usage 


Differences by Age (Survey — Diary) 


Month Below 31 31-40 41-50 Over 50 p-value 
(1) (2) (3) (4) 
February 0.632 0.749 1.026 0.949 0.310 
March 0.016 0.348 0.837 0.709 0.210 
April 0.413 1.083 1.174 0.889 0.209 
May 0.305 0.706 1.085 0.923 0.217 
June 0.525 0.845 1.570 0.989 0.080 
July 0.535 0.706 1.226 0.667 0.371 
August 0.507 0.807 1.070 0.667 0.578 
September 0.924 1.003 1.459 1.109 0.580 
October 0.789 0.816 1.307 0.903 0.583 
November 0.632 0.805 1.415 0.741 0.240 
December 0.337 0.203 0.574 — 0.030 0.494 
January 0.603 0.519 0.922 0.069 0.197 
Mean2 0.518 0.716 1.139 0.715 0.385 
n 383 374 270 495 


4 Twelve month average. 
* Significant at the 0.01 level. 
** Significant at the 0.05 level. 


31 group in each of the twelve periods. Thus, the relationship between differences in survey 
and diary usage reports and age of respondent is a monotonically increasing function up to 
age 50 where the difference, although still positive, declines after age 50. Again, the average 
differences across the various age levels are not statistically significant based on the one-way 
ANOVA or the Tukey Studentized f-tests. 


5.6 Education 


The relationship between the difference in survey versus diary reported usage and respon- 
dents’ level of education is depicted in the statistics found in Table 6. As reported in the table, 
a general tendency to over-estimate usage in surveys is characteristic of respondents at all educa- 
tion levels. Respondents with the least amount of formal education tend to over-estimate usage 
in survey reporting to a lesser extent than respondents with more formal education. The greatest 
tendency to over-estimate usage in surveys occurs for respondents who have completed high 
school followed by those who have had some college. The results of the one-way ANOVA and 
Tukey Studentized t-tests, however, indicate that the differences across education levels are 
not statistically significant at the p = 0.05 level. 


5.7 Four-Way ANOVA Results 


The main and interaction effects of the demographic variables as explanations of the difference 
between survey and diary purchase data reporting are presented in Table 7. It is reported that 
income, sex, and their interaction are the variables with statistically significant p-values. All other 
main and interaction affects are insignificant in explaining variations in the difference variable. 
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Table 6 


One-Way Anova Results Relating Respondent’s Education Level and the 
Difference Between Survey and Diary Reported Usage 


Differences by Education (Survey-Diary) 


Month Some Completed Some Completed 
High School High School College 4-Yr. Col.Deg p-Value 
(1) (2) (3) (4) 
February 0.790 1.015 0.951 0.578 0.546 
March 0.290 0.853 0.592 0.059 0.117 
April 0.556 OAS 0.979 0.445 0.078 
May 0.685 1.134 0.756 0.345 0.139 
June 0.548 1.158 Patt 0.696 0.368 
July 0.194 0.931 1.021 0.467 0.195 
August 0.347 0.891 0.845 0.620 0.681 
September 1.040 12137 1.190 1.018 0.959 
October 0.468 1.195 0.826 0.878 0.475 
November 0.508 1.119 0.896 0.592 0.383 
December 0.081 0.500 0.244 0.061 0.558 
January — 0.097 0.626 0.842 0.129 0.138 
Mean@ 0.438 0.986 0.854 0.491 0.294 
n 124 476 431 490 


4 Twelve month average. 
* Significant at the 0.01 level. 
** Significant at the 0.05 level. 


Table 7 


Four-Way ANOVA Results Relating 
Demographics (Sex, Education, Age, and Income) 
to the Differences Between Survey and 
Diary Recording of Long-Distance Telephone Usage 
it tn I a he Bg a F 


Variable Df Sum of Squares F-Value p-Value 
pl pane AEE ee ee re Satie AR ee gas Seat Lea tare Sp ke oA set ane cane eee 
Sex 1 131.082 6.48 OOLT** 
Education 3 79.001 1.30 0.272 
Age 3 58.465 0.96 0.409 
Income 4 210.077 2.60 0:033** 
Sex & Education 3) 77.629 1.28 0.280 
Sex & Education 4 220.032 Zeke. 0.028** 
Sex & Age 3 47.311 0.78 0.506 
Ed. & Income 12 263.931 1.09 0.367 
Ed. & Age 9 81.083 0.45 0.911 
Income & Age 12 211.718 0.87 0.576 


a 
* Significant at the 0.01 level 
** Significant at the 0.05 level. 
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6. CONCLUSION 


The findings of our study indicate that, at the individual respondent level, survey data are 
very inaccurate in measuring the respondents’ actual usage of long-distance telephone com- 
munication. Our results support the earlier conclusions of Parfitt (1967), Sudman (1964) and 
Wind and Lerner (1982) who analyzed this issue with respect to non-service related consumer 
products. We cannot report, however, that our results either support or refute those of Stanton 
and Tucci (1984) since the time frames, and thus the recall periods, are considerably different 
in the two studies. 

The importance of our findings extends beyond simply confirming the results of previous 
studies and extending the range of product types to include the analysis of a consumer service 
item. Our findings identify the fact that the over-reporting that occurs in surveys varies along 
two important demographic dimensions: respondents’ household income and sex. Respondents 
who report very low household income tend to under-estimate usage in survey reporting while 
wealthier respondents do the opposite. Furthermore, this relationship tends to increase mono- 
tonically with increases in income levels and then declines. Female respondents tend to over- 
estimate usage in surveys by a considerably greater magnitude relative to male respondents. 
Taken together the findings suggest a strong possibility for measurement problems occurring 
if purchase data are collected using the survey method. 
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Quality Assurance Sampling for Evaluating Health Parameters 
in Developing Countries 


STANLEY LEMESHOW and GEORGE STROH, JR.! 


ABSTRACT 


A typical goal of health workers in the developing world is to ascertain whether or not a population 
meets certain standards, such as the proportion vaccinated against a certain disease. Because popula- 
tions tend to be large, and resources and time available for studies limited, it is usually necessary to 
select a sample from the population and then make estimates regarding the entire population. Depen- 
ding upon the proportion of the sample individuals who were not vaccinated, a decision will be made 
as to whether the coverage is adequate or whether additional efforts must be initiated to improve cov- 
erage in the population. Several sampling methods are currently in use. Among these is a modified method 
of cluster sampling recommended by the Expanded Programme on Immunization (EPI) of the World 
Health Organization. More recently, quality assurance sampling (QAS), a method commonly used for 
inspecting manufactured products, has been proposed as a potentially useful method for continually 
monitoring health service programs. In this paper, the QAS method is described and an example of 
how this type of sampling might be used is provided. 


KEY WORDS: Lot sampling; Quality assurance; Acceptance sampling; Vaccination coverage. 


1. INTRODUCTION 


One of the problems continually confronting managers of health service programs is the 
identification and application of cost-effective and practical methods to monitor and evaluate 
operations. In developing countries the solution to such problems is usually complicated 
because records are often poorly maintained, reports from dispersed health facilities are 
usually received late or not submitted at all, and accurate target population sizes are not 
available. Consequently, community-based surveys are often the only means to obtain reliable 
numerator (i.e., number of individuals with a characteristic) and denominator (/.e., number 
of individuals studied) data. However, such surveys can be difficult to organize and imple- 
ment and are often too costly to be used to monitor program operations. 

Perhaps the best example of a program in which community-based surveys have been 
routinely used to collect information is the Expanded Programme on Immunization (EPI) 
of the World Health Organization (WHO) (see Henderson and Sundaresan 1982). The EPI, 
from its inception, has employed a cluster sampling method designed to measure immuniza- 
tion coverage in young children (see Serfling and Sherman 1975 and Henderson ef al. 1973). 
The particular survey methodology was kept as simple in concept and application as possible 
to allow program managers and supervisors, often with minimal background in sampling tech- 
niques, to organize and implement the surveys (see WHO 1979). These surveys, which have 
been termed ‘‘30 by 7”’ surveys, typically involve 30 clusters and 7 individuals studied per 
cluster. Indeed, the strength of the EPI survey method lies in the simplicity of the design, 
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the standardized rules for implementation, and the uncomplicated procedure for compiling 
and interpreting results. Discussion and criticisms of the method on theoretical grounds are 
available elsewhere (Lemeshow ef a/. 1985 and Lemeshow and Robinson 1985). 

Recently, EPI officials have recognized several practical limitations of the survey method- 
ology. The first concern is that the results obtained with the survey method are relatively 
imprecise — estimates of coverage obtained can only be expected to be within 10 percentage 
points of the actual level of coverage in the population sampled. In developing countries where 
high levels of coverage have been attained, the method is too imprecise to identify significant 
changes between sequential surveys, or between different strata of a population being evaluated. 

The second concern about the use of the EPI surveys is that, even though they are relatively 
easy to implement, they are still too great an undertaking for most local managers to use to 
assess Operations in their areas of responsibility. Consequently, it is still most common for an 
EPI survey to be done for the entire population of a country, or for population units of relatively 
large size (e.g.: millions). Although the results are useful for managers at higher program levels, 
local managers and supervisors are unable to use the results at their levels of responsibility. 

EPI surveys usually measure the percentage of children in an age cohort (usually 12 to 23 
months of age) that should have received the entire series of vaccines that are provided in the 
EPI. The third concern is that this results in measurement of operations that preceded the date 
of the survey by more than a year; operations may have changed considerably during that 
interval. 

Finally, an additional objective of the EPI is to develop accurate record keeping that can 
be used to monitor and evaluate coverage — the surveys are the primary means of assessing 
the validity of records. However, with the current age groups surveyed, it is often difficult to 
identify the set of records that correspond to the period during which immunizations were given 
to the children surveyed. 

In this paper, we present a method which has been proposed to continually monitor a health 
service program and can be used to assess whether operations are maintained at an acceptable, 
specified level. To do this, a particular type of stratified random sampling (Cochran 1977; 
Hansen et al. 1953; Kish 1965; Levy and Lemeshow 1980) is employed that uses very small 
samples obtained from operationally defined units of the population. Not only can this type 
of community-based sampling permit monitoring of operations within relatively small popula- 
tions or small areas of operation, but the results will permit managers at virtually all levels 
to obtain estimates to continually evaluate program operations with sufficient precision. In 
areas where record systems have been developed that can be used to monitor program opera- 
tions, the same sampling method can be used to validate the records and ensure that an accurate 
numerator and denominator are available from records. Once validated these records can then 
be relied upon as the major source of information for program monitoring and evaluation. 
The general term applied to this method of sampling, which we propose as a useful alternative 
to more traditional methods applied in the area of public health program evaluation, is Quality 
Assurance Sampling (QAS) — a term well known in the areas of engineering, manufacturing 
and business. 


2. THE QAS METHOD 


The origin of QAS is in sampling and inspecting manufactured products (Dodge and Romig 
(1959)) where it was developed to keep labor and other sampling costs at minimal levels. One 
type of QAS sampling, Lot Quality Acceptance Sampling (LQAS) is identical to stratified 


Survey Methodology, June 1989 73 


sampling, but the samples are too small to provide what are usually considered acceptably 
narrow confidence intervals for estimates for a specific stratum (usually called a ‘‘batch’’ or 
“‘lot’”’ in industry). Rather, a decision is made about the quality of a particular batch or lot 
based on the probability that the number of defective items in the sample is less than or equal 
to a specified number. The results of the samples taken from all the mutually exclusive and 
exhaustive batches can be combined to provide a precise overall estimate of the average quality 
of the total product. 

The strategy and goals of QAS in the health field would be similar to those in the manufac- 
turing field. The purchaser of goods does not want to accept a batch with more than a certain 
percentage (P,) defective whereas the manufacturer wants to continually monitor production 
to identify products with more than an expected percentage (P,) of defectives. It is not unusual 
for P, and P, to be unequal. It is not difficult to see the similarities between the objectives of 
a manufacturer and a health manager or supervisor. The latter ‘‘produces’’ immunized children 
rather than a manufactured item. 

Generally, a lot is an ‘‘operationally useful’’ unit. For example, in an industrial applica- 
tion, if there were several machines producing the same part and three operators assigned to 
each machine, then ‘‘lots’’ could be chosen that are produced by the same machine — par- 
ticularly if any variation in the parts produced is most likely to be due to machine drift as 
opposed to operator input. 

For public health work, a manager might define ‘‘lots’’ as recipients of services from a single 
operational unit — such as a health post (HP) immunization team — over a specified period 
of time. The amount of time between sampling could coincide with the interval between ‘‘high 
incidence’’ seasons for immunizeable diseases, but would more likely be related to the amount 
of time and cost associated with the sampling than any other single consideration. 

In public health work a serious error would be made if the population were judged to be 
adequately covered (‘‘accept the lot’’) when, in fact, it is not. In order to control for this 
possibility, we design the procedure as a one-sided test. 

The null hypothesis, illustrated at the 50% level, is 


H,: P = P, (i.e., proportion of unvaccinated children = 0.50) 
versus 


H,: P < P, (i.e., proportion of unvaccinated children < 0.50). 


The four-celled table presented in Figure 1 describes the consequences of the testing procedure. 
Because the test is set up as one-sided, and because we assume the population is not adequately 
covered unless we reject H,, the type I error, i.e., accepting the lot when it is defective (false 
negative), is the most serious error. That is, if (using the example of immunization) a popula- 
tion (lot) of children is thought to have an acceptable proportion immunized when, in fact, 
it does not, the larger number of susceptibles in the population increases the risk of transmis- 
sion of the disease. Hence, we consider the ‘‘cost’’ of declaring that the population is adequately 
vaccinated, when it is not, to be high. On the other hand, the type II error, rejection of an 
acceptable lot, is not as serious since the result of a false-positive decision would be to concen- 
trate efforts on an already adequately vaccinated population. 

The fundamental problem in LQAS sampling, is not so much one of simply determining 
sample size, but of choosing an appropriate balance between sample size and critical region. 
In all cases, the computation of 6 will depend upon the actual value of P when it is assumed 
to be different from P,. 
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Actual Population 


Not adequately vaccinated Adequately vaccinated 
test recognizes or is sensitive to “Provider Risk”’ 
arent é lack of adequate coverage ee? 
Fail to reject Ho — “reject 
e the lot 
1-a B 
Cc “not adequate 
i COveraUe sensitivity false positive rate 
: “Consumer Risk’”’ test recognizes adequate coverage 
i Reject Ho — accept” 
Oo a 1-6 the lot 
“adequate 
, coverage” false negative rate specificity 


Figure 1. Consequences of Hypothesis Testing in LQAS Procedure 


In practice, initially a minimal level for delivery of a service would be defined on the basis 
of the probable distribution of service levels across lots as well as in terms of practicality (/.e., 
a level that could be achieved). Once this level is defined, sample size options are considered 
relative to the number of lots that would be misclassified with stated type I and type II errors. 
If the sample size were too large to be practical, there would be several options including: 
retaining the sampling scheme, but lengthening the time interval between sampling; choosing 
another critical level that would allow use of a smaller sample size; choosing another QAS 
sampling scheme (such as double sampling or sequential sampling) that would meet the objec- 
tives of classifying the lots and still be operationally feasible; and abandoning a QAS scheme. 

One means of computing probabilities and determining necessary sample sizes can be 
accomplished using the binomial distribution. We will assume, as is usually the case, that N 
is very large relative to n; with large N, the Poisson can be practically substituted for the 
binomial. However, if it happens that N is not large relative to n, then the hypergeometric dis- 
tribution can be used as described in Brownlee (1965) (Sec. 3.15). Letting p denote the pro- 
bability of observing the characteristic, then the chance of observing exactly d individuals with 
the characteristic in a sample of size n is given by 


p(d) = ()P4(1—P)"~4. 


Suppose we decide that 7 is the sample size we wish to use. The rejection region for the test 
states that we should reject H, (and ‘‘accept the lot’’ as adequately vaccinated) ifd < d*. To 
determine the value of d* such that Pr(d < d*) = a, we must compute Pr(d < d*) fora 
number of values of d*. Clearly if we decide to use d* = 1 then Pr(d < d*) would equal 0.0625 
and the power of the test, if 70% of the population is actually unvaccinated, would equal 0.0038. 

Results of a particular choice of n and d* may be graphed as an operating characteristic 
(OC) curve where the variable on the horizontal axis is the proportion, P, in the population 
who have not been vaccinated. The vertical axis presents the probability of rejecting the null 
hypothesis H,: P = P, and concluding that the vaccination coverage in the population is 
adequate. Each combination of n and a* will generate a unique curve. Figure 2 presents a typical 
OC curve for 71-= 7) d*4=1; 
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Figure 2: Operating Characteristic Curve for n=7 and d*=1 


The investigator will usually choose the value of d* which yields a type I error less than a. 
Sometimes this strategy results in an extremely conservative test. For example, with n = 7, 
d* = Oand P, = 0.5, a would equal 0.0078. Here the use of d* = 1 witha = 0.0625 asin 
Figure 2 might be justified. Table 1 presents values of d* for small n (<20) such that a will 
not exceed the stated type I error probability (0.01, 0.05 or 0.10) for various combinations of 
nand P,. Details for the construction of this table are presented elsewhere (Dodge and Romig 
1959). 

The choice of the sampling scheme comes down to one of combining the desired power, 
1 — £, with the desired a level. Rather than providing curves which are difficult to read 
precisely, we developed Table 2 which presents values of (n,d*) pairs fora = 0.05, 8 = 0.20, 
and selected values of P under the null hypothesis (P,) and P under the alternative hypothesis 
(P,). In this table, (n,d*) are chosen so that Pr(d < d* | n,P,) < aand Pr(d s d* + 1 | 
n,P,) > a. More details are provided elsewhere (Lemeshow et al. 1987). 

This table clearly shows the trade off one must make between power and sample size in LQAS 
surveys. For instance, it is essentially impossible to have a=0.05, 8 =0.20 and use n=5 unless 
P, under the alternative was actually close to 0. Hence investigators with limited resources 
must be ready to compromise on the value of 8 or the difference between P, and P,. 

The method of quality assurance sampling described to this point is known as ‘‘single 
sampling’’ since only one sample is taken before a decision is reached regarding the disposi- 
tion of the lot. A modification of this LQAS procedure, which may be useful under certain 
field conditions, incorporates a ‘‘double sampling’’ strategy. With this method, a sample is 
first selected of size n,. If this sample fails, a second sample of size n, may be selected. This 
requires the specification of two acceptance numbers. The first, d,, applies to the observed 
number of defectives in the first sample alone and the second, d,, applies to the total number 
of defectives in the first and second samples combined. In practice, the principal advantage 
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Table 1 
Values of d* for Combinations of P, and n to Achieve alpha < 0.01, 0.05, or 0.10 


P,, alpha < 0.01 P,, alpha < 0.05 P,, LPH < 0.10 
n 0.50 0.60 0.70 0.80 0.90 0.50 0.60 0.70 0.80 0.90 0.50 0.60 0.70 0.80 0.90 


5 < < 0 | 2 0) 0 1 1 2 0 1 1 2 3 
6 x 0 0 1 2 0 1 1 2 3 0 1 Z 3 3 
i 0 0 1 2, 3 0 1 2 3 4 1 2 2 3 4 
8 0 1 1 2: 4 1 2 2, 3 5 1 2 3 4 5 
9 0 1 2 3 5 1 2 3 5 5 2, 3 4 5 6 
10 0 1 2 4 5 1 2 4 5 6 2 3 4 5 6 
11 1 2 3 4 6 2 3 4 5 7 m2 4 5 6 8 
2 1 2, 4 5 fT 2 3 5 6 8 3 4 5 7 8 
13 1 3 4 6 8 3 4 5 7 9 3 5 6 8 9 
14 D, 3 5 6 9 3 4 6 8 10 4 5 7 8 10 
iS 2 4 5 7 9 3 5 7 8 10 4 6 i 9 11 
16 2 4 6 8 10 4 5 v 9 11 4 6 8 10 12 
17 3 4 6 8 11 4 6 8 10 12 5) 7 8 10 13 
18 3 5 7 9 12 5 6 8 10 13 5 7 9 11 14 
19 4 5 7 10 13 5 7 9 11 14 6 8 10 12 14 
20 4 6 8 11 13 5 7 10 12 15 6 8 10 13 15 
x No test for this sample size. 
Table 2 
Sample Size and Decision Rule for LQAS, Alpha = 0.05, Beta = 0.20, 
One-sided Test 
Po 
Pg 0.50 0.60 0.70 0.80 0.90 
rh 5 3d be Fee shat 8 le th gy ate dh. 5 a Wend 
0.05 Soren 0) x< x x x 
0.10 Saal oh ye) xe x x 
0.15 I es dcr? ie ae | ~< x x 
0.20 15753 QU Ps2 SSELFI Me x 
0.25 Dour Tyiv A2ene}a3 Jee sored x< x 
0.30 B71 wel 16.) OF eS Swen | Sa 
0.35 6/26 DA ee LO Li es Ga x 
0.40 153. 66 sae, SET 16% 507, Sy ye x 
0.45 617 , 288 Gineee 33 oe ED LORS S St s5 2 
0.50 Si, a8 3 Sitvee20 139 ai, Gx! s9e3 
0.55 601 , 340 62H, Ome “lige tied, 
0.60 137 , 86 29 , 19 lt 
0.65 S35 O50 DOM 5 i ate ae 
0.70 109 , 80 ZPuES 
OWS AlOly 2321 B38 ar]. 
0.80 69s 
0.85 WSR pegvA E, 


x Sample size less than 5. 


Survey Methodology, June 1989 Th 


of double sampling is that, if the defective rate is relatively low, it may be possible to study 
fewer subjects than with single sampling since n, is typically less than the n required in single 
sampling. However, if it becomes necessary to go to the second sample in many of the lots, 
the procedure may require a larger overall sample size. In most cases, the total sample size would 
be less than n; + n> since sampling stops as soon as the critical value, d,, is exceeded in the 
second sample. (The first sample is always completed to provide the information to be com- 
bined and used to compute the overall proportion acceptable in the population). Details for 
this procedure are presented elsewhere (Dodge and Romig 1959) and an example will be 
presented in Section IV. 


3. ESTIMATING THE OVERALL POPULATION PROPORTION 
WITH QAS SAMPLING 


In addition to the binary decision to ‘‘accept’’ or ‘‘reject”’ the lot, the simple random samples 
within each HP may be considered a stratified sample and an overall population estimate 
constructed. 

For example, suppose 294 HP’s of known population size were sampled selecting 7 children 
from each. Using standard stratified sampling formulae, estimates may be obtained for P, 
Var(P), and an appropriate confidence interval may be constructed. LQAS resembles stratified 
sampling in that it requires that an accurate sampling frame be established in each lot and that 
a simple random sample be selected from each of these lots. However, it does not provide more 
information than conventional stratified random sampling since confidence intervals could 
be established for each stratum (or lot) and decisions could be based on values covered by each 
such interval (if sample sizes were made large enough to provide useful confidence intervals). 

Although the 7 for each stratum in LQAS are too small to provide useful confidence intervals 
for estimates for each stratum, an appropriately designed LQAS scheme may provide a means 
for continually testing strata and classifying them as ‘‘acceptable’’ or ‘‘unacceptable’’ in terms 
of a particular outcome. This results from the fact that LQAS sample sizes are relatively small, 
increasing the likelihood that sampling can be done more frequently. Among its benefits, the 
rules of LQAS sampling are simple to follow, requiring minimal retraining of the 
surveyor/classifier. Lastly, since LQAS samples are, in fact, stratified random samples, the 
results for strata can be combined to provide adequately precise estimates for groups of strata, 
such as for districts, regions, or a nation as a whole. 

The potential benefits of use of an LQAS scheme must be weighed against the loss of preci- 
sion expected with the small samples taken in each stratum. Perhaps the best way for the reader 
to judge whether LQAS might be useful is an example in which a conventional stratified random 
sample survey approach is compared with an LQAS scheme. 


4. AN EXAMPLE OF THE APPLICATION OF QAS 


The example is set in circumstances similar to those in Costa Rica, and is applied to 
immunization coverage of children which is provided by 294 HP that cover the population of 
the country. The manager of the EPI would like to know the percentage of children, 12-23 
months of age that received all of the immunizations that should have been given during their 
first year of life. Based on the immunizations that have been reported by staff, the manager 
thinks that the coverage level for the nation is about 60%, but the coverage that has been 
reported by the 294 individual HP varies from 20% to 100%; it is thought that the distribution 
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of coverage rates is uniform across the range. The EPI manager suspects that the estimates 
of coverage provided on reports may not be completely accurate because of numerator and 
denominator errors. As a result, it is decided that a survey of HP areas should be made in order 
to obtain estimates of coverage for each of the 294 areas since it would be important to be able 
to concentrate supervision on those HPs that have ‘‘low’’ coverage. 

The first plan for the survey that the EPI manager evaluates is a ‘‘conventional’’ stratified 
random sampling scheme. Coverage estimates are required for each of the 294 HP, and each 
estimate should have confidence bounds no larger than an absolute 10%, witha = 0.05. Since 
the average HP population is approximately 2500, and since it can be estimated that 3.5% of 
the population are children between the ages of 12 and 23 months, it is estimated that the number 
of children available for sampling in each HP will be approximately 2500 x 0.035 = 88. The 
formula for sample size determination which incorporates a finite population correction is given 
by Cochran (1977, p.75) and results inn = 47. 

Thus, in each of the 294 HP areas, 47 (53%) of the 88 children between the ages of 12 and 
23 months will be surveyed. In the entire country, 13,818 children in this age group will be 
surveyed. For the national estimate of coverage, P can be estimated to within 0.5% (assuming 
the worst level of coverage for precision (50%) and little variation in HP populations). 

The manager then considers a QAS scheme. It is decided that any HP that has a coverage 
level of 70% or lower is performing poorly, and should be identified for increased supervi- 
sion. The manager wants to be able to identify a HP with coverage of 70% with a probability 
of about 0.95, and HPs with lower levels of coverage with even higher probability. Several 
QAS schemes are considered and a double sampling scheme is proposed. 

The particular double sampling scheme proposed can be denoted as Nay = lO ania 
Nz:d, = 14:3. This means that in each HP area an initial sample of 10 children will be 
surveyed for their immunization status. Regardless of how many children are found unim- 
munized, all 10 will be surveyed. The number of children found unimmunized among each 
HP sample of 10 children will be used to compute estimates for combined areas and ultimately 
for the national estimate of coverage. If upon completion of a survey of the first sample of 
10 children, none are found unimmunized, the HP will be categorized as having ‘‘acceptable’”’ 
coverage. If 4 or more children are found unimmunized, the HP will be classified as having 
““unacceptable’’ coverage. In either scenario, no further sampling is required in the HP area. 
However, if upon completing the initial survey, 1, 2, or 3 children are found unimmunized, 
a second sample of 14 additional children is drawn. During the survey of the second sample, 
whenever a total of 4 unimmunized children is reached (including those from the first sample 
of 10) the survey is stopped, and the HP area is classified as having ‘‘unacceptable’’ coverage. 
However, if upon completion of the second sample, a total of 3 or fewer unimmunized children 
have been found, the HP area is classified as ‘‘acceptable’’. 

Figure 3 shows the operating characteristic curve for this particular sampling scheme. This 
curve allows one to predict what the probabilities are for correctly classifying HP areas on the 
basis of the level of coverage. We will assume that the distribution of the 294 HPs is uniform 
and that all HPs in each decile have a coverage that corresponds to the mid-point value for 
each decile. If the probabilities of accepting a HP as having acceptable coverage are read from 
the OC curve and are applied to the numbers of HPs in corresponding deciles, it is possible 
to predict the number of HPs that would be accepted and rejected as having acceptable levels 
of coverage. The results of this projection are shown in Table 3. 

As can be quickly computed from the expected results shown in the table, greater than 99% 
(183 of 184) of the HPs that had coverage less than 70% would be “‘rejected’’ (i.e., they are 
classified as having an unacceptable level of coverage). Of the 110 HPs that had coverage above 
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Figure 3: Operating Characteristic Curve for Double Sampling Scheme 


with 1,:d,; = 10:0 and n2:d> = 14:3 


Table 3 


Expected Classification of 294 HP with Use of Double Sampling 
Scheme 7;:d; = 10:0 and n2:dz = 14:3 


Number of HP Classified as: 


Percentage Coverage Number 

in HP Area of HP >70% Coverage <70% Coverage 
20- 30% 36 0 36 
31- 40% 37. 0 37 
41- 50% 37 0 37 
51- 60% 37 0 37 
61- 70% 37 1 36 
71- 80% 37 7 30 
81- 90% 37 21 16 
91-100% 36 34 2 
Total 294 63 231 


Number of HP with Coverage < 70% = 184. 
Number Correctly Classified = 183 (99%). 
Number of HP with Coverage > 70% = 110. 
Number Correctly Classified = 62 (56%). 
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70% , 62 (56%) would be accepted (i.e., they are classified correctly as having an acceptable 
level of coverage). Although a substantial portion of the HPs (48 of 110) that had coverage 
higher than 70% would be incorrectly classified as having ‘‘low’’ coverage, it should be noted 
that 63% (30 HPs) of them had coverage that was in the ‘‘marginal’”’ range (i.e., coverage levels 
in the 70-80% range). 

Based on the initial samples of 10 children completed for each of the 294 HPs, a national 
estimate can be computed as with any stratified random sample. Using the same assumptions 
as were made for the ‘‘conventional’’ plan, the 95% CI for the national estimate of coverage 
from the QAS scheme would estimate P to within 1.8%, a level of precision that is adequate 
for the purpose of the EPI manager. 

It should also be noted that the total number of children that would be surveyed in each 
HP area would vary between 10 and 24. In fact, with the particular distribution of coverage 
levels assumed in this example, the majority of HPs would be classified on the basis of the 
initial sample of 10 children (i.e., of the 184 HP with <70% coverage, about 98% would be 
classified as unacceptable from the initial n,;:d; = 10:0 sample). Of the minority of HPs 
which were not classifiable on the basis of the initial sample, few would require surveying all 
14 children in ny. Thus, the ‘‘average’’ number of children sampled across all 294 HP would 
be substantially less than n, + np. 

In conclusion, LQAS may have useful application in certain settings in which conventional 
stratified random sampling — requiring sufficient sized samples from each stratum to pro- 
duce useful confidence intervals for the estimates obtained — is too costly and/or time con- 
suming. LQAS is, in fact, nothing more than another way of interpreting data obtained with 
a stratified random sample with samples too small to provide meaningful confidence intervals. 
Because it may be possible to do such small sampling more frequently, the potential exists for 
establishing a system for continual monitoring of an activity, perhaps using staff that with min- 
imal training could include monitoring activity with other field duties. One further advantage 
of the more frequent sampling could be that rather than concentrate on an age cohort that 
has passed through the full period of exposure to all immunizations, managers could instruct 
surveyors to collect information on children in the process of being immunized — i.e., deter- 
mine whether children have received the immunizations that are appropriate for their age. This 
would provide a means of obtaining information on more current activity, and afford an oppor- 
tunity to intervene in a more timely manner to improve coverage. 

Although confidence intervals will always provide much more information than a simple 
binary decision, the sample sizes required to obtain any useful level of precision on estimates 
for relatively small strata may be prohibitive. In such instances, an appropriate QAS scheme 
may be an alternative approach worthy of consideration. 
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European Experience of Using Administrative Data 
for Censuses of Population: 
The Policy Issues That Must be Addressed 


PHILIP REDFERN! 


ABSTRACT 


The experience of the four Nordic countries illustrates the advantages and disadvantages of a register-based 
census of population and points to ways in which the disadvantages can be contained. Other countries 
see major obstacles to a register-based census: the lack of data systems of the kind and quality needed; 
and public concern about privacy and the power of the State. These issues go far beyond statistics; they 
concern policy and administration. The paper looks at the situation in two countries, the United Kingdom 
and Australia. In the United Kingdom past initiatives aimed at population registration in peacetime 
foundered and the present environment is hostile to any new initiative. But the government is going ahead 
with a controversial reform of local taxation that involves setting up new registers. In Australia the govern- 
ment tabled a Bill to introduce identity cards and an associated register, and advanced clearcut political 
arguments to support it; the Bill was later withdrawn. The paper concludes that the issues involved in 
reforming data systems deserve to be fully discussed and gives reasons why statisticians should take a 
leading part in the debate. 


KEY WORDS: Census of population; Identity cards; Personal reference numbers; Population registers; 
Record linkage. 


1. INTRODUCTION 


This paper has its origin in a study of alternative approaches to the census of population 
that I carried out for the Statistical Office of the European Communities (Redfern 1987). The 
study examined the experiences of the 12 member countries of the EEC together with Canada, 
Sweden and the United States. The study found that sample surveys can complement, but 
cannot replace, a 100 per cent census, because they do not provide reliable statistics for small 
areas. An important example of samples complementing a 100 per cent enumeration is the short 
form/long form censuses of Canada and the U.S. A sample survey complementing 100 per 
cent data from registers is in prospect in Norway (Section 3.3). 

Registers that contain addresses give figures for small areas; and, if the registers cover the 
census topics reliably (in terms of definitions, coverage, accuracy and timeliness) and can be 
linked, it is possible to create a record for each individual akin to his census return and so to 
conduct a register-based census: in essence administrative data are being recycled for statistical 
purposes. The pressure of costs and the burden of formfilling in the traditional census have 
persuaded the Nordic countries (Denmark, Finland, Norway and Sweden) to adopt this 
approach in whole or in part. 

Though administrative data can support a conventional census in a variety of ways (Redfern 
1987, paragraphs 3.65—3.67), it is their use in a register-based census that provides the first 
main theme of this paper. Section 2 describes the registers that are needed as a base for a census 
and Section 3 identifies the similarities and differences between the four Nordic countries in 
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their approaches to this kind of census. Section 4 then considers the obstacles that other coun- 
tries would face if they were to upgrade their record systems so as to make a register-based 
census feasible, and recognises that the issues raised concern administration and policy more 
than statistics. 

It is these wider issues that provide the second main theme of the paper. Section 5 looks 
in more detail at a country in which, for reasons of policy and ideology, administrative 
records are not coordinated through a population register: the United Kingdom. Section 6 
describes a recent initiative in Australia to improve administrative records. Finally Section 
7 summarises the political arguments for and against coordinating administrative records 
through population registers and puts the case for statisticians taking a leading part in debate 
on the subject. 


2. THE REGISTERS NEEDED AS A BASE FOR THE CENSUS 


2.1 Population Registers 


The essential starting point for a register-based census is a population register that includes 
personal reference numbers and addresses. The personal numbers must be in one to one cor- 
respondence with the members of the population. To keep the register up-to-date the citizen 
is obliged to notify changes. The personal numbers are also recorded in the files of the various 
administrative agencies, and so can be used to link records for statistical purposes. 

Population registration serves essentially administrative ends. It is an efficient way of 
organising the many dealings between public authorities, both central and local, and the indi- 
vidual citizen: for example taxes, social security, publicly-provided health services and elec- 
toral registration. To work effectively, population registration should serve a wide range of 
administrative activities, so that opportunities for updating and correction are frequent and 
the citizen becomes used to quoting his personal number. 

The key to the system is the central population register which records identifying informa- 
tion about each person (name, place and date of birth, date of immigration, marital status, 
and possibly items like parentage and citizenship) and his permanent reference number. In most 
countries the central population register includes up-to-date addresses, though the French 
Répertoire National d’Identification des Personnes Physiques does not. The basic 
administrative function of the central register is to act as reference point for administrative 
agencies which can check the identities of the individuals that they are dealing with and, as 
necessary, can correct or record the personal reference numbers in their own files. 


2.2 Other Key Registers in a Register-Based Census 


A register-based census of population and housing makes use of registers of other kinds 
of units than persons. The most important are a central register of housing and a central 
register of business enterprises and establishments (workplaces). Provided the housing reg- 
ister identifies each housing unit (and not just the building or the address) with a code that 
also appears as part of the address in the population register, then data on the housing unit 
in the housing register can be associated with data on the occupants in the population reg- 
ister: the two registers can be linked. Similarly a register recording each person’s employer 
and workplace can be linked to a central register of enterprises and establishments to show 
the person’s industry, commuting journey, efc. 
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3. CENSUSES IN THE NORDIC COUNTRIES 


The four Nordic countries have well-developed population registers of the kind described 
in section 2.1. They have constructed, or propose to construct, central registers of building 
and housing to serve mainly administrative purposes. This section of the paper outlines the 
census of each country in turn and then summarises the directions in which Nordic census- 
taking is developing. 


3.1 Denmark 


Denmark is the only Nordic country — and I believe the only European country — to have 
switched completely from the conventional census to a register-based census. The switch was 
made in little more than a decade. The central population register with personal reference 
numbers was created in 1968 for administrative purposes, and a register-based census of popula- 
tion (but not housing) followed in 1976. A central register of buildings and dwellings was created 
in 1977, again mainly for administrative purposes, and a register-based census of population 
and housing followed in 1981. Another significant step in 1979-80 was to extend the return 
in which employers report each employee’s earnings to the tax authorities: employers with more 
than one workplace added each employee’s workplace to the return. This was done purely for 
statistical purposes and the statistical office has had to make a considerable effort to secure 
a good response. 

The registers held by Danmarks Statistik for statistical purposes, numbering some 37, pro- 
vide annual or more frequent statistics of population, employment, commuting, income, 
housing and construction for municipalities and sometimes smaller areas. But, because of the 
cost, analysis on the scale of a census takes place much less frequently: the next after the 1981 
census will take place in 1991 and even that may be on a lesser scale than 1981. 

The transition to a register-based census has been facilitated by the reorganisation of the 
Danish central statistical office in 1966. Danmarks Statistik was given a measure of inde- 
pendence of the central government, which could help to reassure the public on confidentiality. 
It was given powers to demand, and to use for statistical purposes, data held by public 
authorities for administrative purposes, and to participate in the construction of registers con- 
taining such data. 

The problems that Danmarks Statistik now faces concern mainly the quality and timeliness 
of data, both of which depend on the efficiency of administrative procedures. Thus the slowness 
in compiling tax authorities’ files — which provide data on industry, occupation, journey to 
work and income — delayed analysis of these topics in the 1981 census until summer 1983; 
and it is expected that statistics on the labour force will continue to lag at least a year behind 
the reference year to which they relate. Reliable data on occupation are particularly difficult 
to obtain because the topic is of little administrative interest; a main source is the information 
given by the taxpayer on his annual tax return. Despite problems of these kinds Danmarks 
Statistik takes the view that the register-based census has come to stay in Denmark because 
of the savings in cost and in burden on the public (Jensen 1983). 


3.2 Finland 


Register-based censuses have a long history in Finland. In the 1600s the parish registers 
recorded everyone over the age of 12 living in the parish, and in 1749 figures of the total popula- 
tion were compiled analysed by age, sex, marital status and social class: one of the first-ever 
register-based censuses? Later censuses followed this pattern. The censuses of 1950 and 1960 
adopted the conventional method of collecting the information through questionnaires. But 
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beginning with the 1970 census an increasing range of data has been extracted from registers. 
In the mid-decade census of 1985 the questionnaire asked only about economic topics: type 
of activity (if any) and occupational status, employer and workplace, occupation, and number 
of months worked in the past year. Data on housing were taken from the register of buildings 
and dwellings that had been created from 1980 census data and is updated with information 
from the municipalities. 

The 1985 census was planned to cost a little under the equivalent of 1 US dollar per person, 
or only a quarter of the cost of the 1980 census in real terms though covering the same range 
of variables. Factors that helped to make this possible included: mail-out of questionnaires 
preprinted with data on workplace (from the 1980 census) and occupation (from the central 
population register) — to be corrected by the respondent if necessary; mail-back to the cen- 
tral office with no local field organisation; only one reminder, with no follow-up of the 3.7 
per cent of forms which were not mailed back or were mailed back incomplete; and imputa- 
tion of missing data, where possible, using a variety of registers, one of which was pension 
records in respect of private sector employment. The final response rate to the questionnaire 
was 97.4 per cent, and by imputing missing data a final coverage of 98.6 per cent was achieved. 
Another reason for the low cost of the census is that part of the cost and burden has been 
transferred to the registration systems, including the annual field checks on the population 
registers by means of forms issued to each household/dwelling and quinquennial checks on 
the register of buildings and dwellings by means of forms sent to owners and occupiers. 

Comparisons between the 1980 census responses and register data on economic variables 
have been regarded as encouraging. This, and the methods developed in the 1985 census to 
impute the economic characteristics of non-respondents, open up the possibility that the 1990 
Finnish census might be wholly register-based. To fill one gap in register data, employers with 
more than one workplace will in future make a return of each employee’s workplace (Laihonen 
and Myrskyla 1987; Heinonen and Laihonen 1987). 


3.3. Norway 


The 1980 census of Norway was to a substantial extent register-based. It took data on basic 
demographic topics, income and completed education (other than education abroad) from 
registers. These data were complemented by means of a mail-out mail-back questionnaire to 
each person aged 16 and over on economic topics, education abroad, country of birth, religious 
affiliation and housing. All persons in the same household were to return their forms, together 
with one housing form, in the same envelope, thus defining the composition of the household 
for census purposes. 

For several reasons it is not feasible to switch to an entirely register-based census in 1990. 
First, register data on some important census variables do not conform to desirable statistical 
definitions or are not of sufficient quality for census purposes (this applies for example to 
industry); and register data for other variables do not exist (for example occupation). Second, 
the development of the register of land property, addresses and buildings (the ‘‘GAB”’ reg- 
ister), begun in 1983, is unlikely to be far enough advanced by 1990 to provide housing data 
for the census. Third, because the link between the GAB register and the population registers 
is the address, it is not possible to identify household composition or to associate housing 
characteristics with personal characteristics when two or more housing units have the same 
address. 

In the 1990 census data from registers will again be used for basic demographic topics, income 
and completed education (other than education abroad). A method is being developed for con- 
verting register data on most of the economic variables to statistically-desirable definitions by 
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reference to the results of an enquiry addressed to a 10 per cent sample of persons aged 16 
and above (100 per cent in municipalities with populations under 6,000). The register data 
for a sub-population would be adjusted in part using sample data for the sub-population and 
in part using sample data for a wider population — a procedure that would partially elimi- 
nate the bias in the register data. The sample enquiry would be the only census source for topics 
for which no register data exist, including occupation and probably housing and household 
composition. 

This approach — the use of registers plus a 10 per cent sample enquiry — is estimated to 
cost 60 per cent of the cost of a census on 1980 lines. The penalties would be the sampling 
variance, which would be greatest for topics for which no register data exist, and also some 
bias in the case of topics for which register data exist but are not of the quality needed for census 
purposes (Heldal et a/. 1987). 


3.4 Sweden 


Over the past two decades the balance of the Swedish census has changed: in 1970 most of 
the data came from questionnaires and a few from registers, but in 1985 the position was 
reversed. In 1985 the mail-out mail-back questionnaire to each person aged 16 and over (or 
married couple) asked only (1) whether the person was economically active in a specified week 
and, if so, the occupation, (2) the household composition — a list of the adults who live in 
the dwelling and (3) housing questions. It was possible to omit questions asked in the preceding 
census on the name of the enterprise at which the person was employed, the workplace and 
the industry, because from 1985 the annual returns that employers make to the tax authorities 
giving each employee’s earnings were extended to show the employee’s workplace. But the topic 
hours of work was dropped from the 1985 census when employers resisted the proposal to 
include this too on the annual returns. 

After the 1980 census a study had been made of the steps that would have to be taken if 
the 1985 census were to be wholly register-based. The steps included: 


(1) The use of data on occupation from the forms on which employed persons report changes 
in income to the national insurance offices. 


(2) The creation of a register of household composition, which would be updated by asking 
for more information when a person moved house. 


(3) The creation of a register of buildings that contain housing units, to be updated by the 
municipalities. 


(4) The creation of a register of completed education, to be updated with information from 
educational institutions on new graduations. 


But, as already noted, a questionnaire was retained in the 1985 census mainly because of doubts 
about the quality of information that could be obtained from registers on occupation, household 
composition and housing. Of the proposed new registers only the register of completed educa- 
tion is as yet under construction. But a committee is studying the possibility that the record 
of a person’s address in the population registers should include the housing unit and not just 
the property — an essential step in linking population registers to housing registers. 

A Parliamentary Commission is reviewing the 1985 census, particularly aspects concerning 
privacy and confidentiality. Its findings will be one of the factors shaping the 1990 census. 


88 Redfern: Administrative Data Policy Issues 


3.5 Summary of Nordic Census-Taking 


The four Nordic countries are developing their censuses along different paths but there are 
many features in common: 


(1) All have as a starting point accurate registers of population which give regular and reliable 
statistics of population for small areas. 


(2) All wish to maximise the use of information in other registers and to minimise the burden 
of formfilling on the public. All are striving to contain or reduce costs. 


(3) All recognise the problems of definition, quality and timeliness of the information in 
registers, particularly for economic topics. Employers’ returns are being extended to give 
information on each person’s workplace, and hence on industry — though extensions 
for purely statistical purposes are unwelcome and may yield data that are of poor quality. 
Register data on occupation are generally unreliable. And data on some topics, such as 
method of travel to work, do not exist in any register. 


(4) Registers of buildings and houses have been created or are proposed. But it is difficult 
to keep the registers up-to-date, whether by using information available to the munici- 
palities or by collecting information directly from owners. In some countries the registers 
need to be further refined to identify each housing unit in a way that permits a link with 
the address information in the population registers. Another problem is how to get data 
on household composition from registers if, as in Sweden, the household is not defined 
as all the occupants of the housing unit. 


All four countries appear ready to sacrifice something in the quality of the census results 
in order to cut costs and the burden on the public. But they differ in their approaches. Den- 
mark has gone the farthest by abandoning the census questionnaire. Because of doubts on the 
quality of some register data, particularly on economic topics, the 1985 censuses in Finland 
and Sweden retained a limited questionnaire, and the responses were linked to demographic 
and other data taken from registers. But the possibility is foreseen of making the 1990 census 
of Finland wholly register-based. In Norway, where there was no mid-decade census, the 1990 
census is expected to retain a questionnaire on at least economic topics but, to reduce costs, 
the questionnaire may be sent only to a 10 per cent sample of persons; where register data for 
economic topics exist, though imperfect, they could be converted to statistically-desirable defini- 
tions by reference to the sample data. A valuable account of Swedish experience of using 
registers as a census source has been given by Johansson (1987). 


4. THE FEASIBILITY OF A REGISTER-BASED CENSUS IN OTHER COUNTRIES 


The two main forces that have driven the Nordic countries towards a register-based census — 
the need to cut costs and the burden of formfilling — have been strongly at work elsewhere. 
They show for example in a halt, and sometimes a reversal, of the pre-1980 trend to longer 
census questionnaires. 

A new and disturbing feature, public protest, disrupted the census in two countries. In the 
Netherlands the plans for a 1981 census were abandoned. The census in the Federal Republic 
of Germany planned for 1983 had to be postponed to 1987 because of more stringent condi- 
tions on confidentiality laid down by the Constitutional Court, and even then there was some 
non-cooperation. No country can feel itself secure against this kind of challenge. But a register- 
based census is less likely to be sabotaged provided it does not have to be supplemented by 
a questionnaire. This is because there is no occasion (Census Day) when everyone is faced with 
a questionnaire and the protests of a minority can be fanned into large-scale opposition. 
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If the register-based census is so much cheaper with less burden on the public and less risk 
of sabotage, why do so few countries see it as a viable methodology? There are three main 
reasons. First, for some topics, particularly economic topics, administrative data may be of 
poorer quality than data collected through questionnaires; and for other topics no 
administrative data exist. The Nordic countries recognise these shortcomings, and so some have 
retained a questionnaire and linked the responses to the data from registers (Section 3.5). 

Second, many countries do not possess the necessary data systems of the kind described 
in Section 2. For example, local population registers may exist but without a central popula- 
tion register, as in the Federal Republic of Germany, Greece and Italy. The population registers 
may not be up-to-date and indeed some countries rely heavily on the canvass for a conven- 
tional census of population to update the registers (Italy and Spain). Outside the Nordic group, 
the Benelux countries have, or are likely soon to have, the data infrastructure needed for a 
register-based census. 

The third main obstacle to a register-based census follows from the second. If the data 
systems have to be radically improved — and particularly if there has to be wider use of personal 
numbers and a new obligation to notify each change of address — opposition may be expected 
from politicians and the public on grounds of privacy and erosion of freedom. There may be 
doubts too whether the public would cooperate in the bureaucratic disciplines of a good reg- 
ister system. In addition, even when the necessary data infrastructure is in place, its use for 
record linkage for census or other statistical purposes could be sensitive. These are important 
issues but they go far beyond statistics. They concern policy and administration. They are now 
discussed by reference to the experience of the United Kingdom. 


5. RECORD SYSTEMS IN THE UNITED KINGDOM 


Decennial censuses in the United Kingdom use conventional methods. The 1981 census was 
probably the most successful census since the Second World War — a success that was helped 
by the shortened form and the omission of a controversial question on ethnicity. So three factors 
combine to make a register-based census seem a rather remote possibility: the 1981 success; 
doubts about the range and quality of statistics that could be extracted from administrative 
records; and the absence of a population register to coordinate the record systems. 

But statisticians have recognised the benefits, both administrative and statistical, that popula- 
tion registers could bring. The two initiatives on this subject in the past 70 years — both of 
which failed — are described in Sections 5.1 — 5.4. Now the government, while opposing a 
central population register, is introducing a limited form of local population register as part 
of a controversial reform of local taxation (Section 5.5). 


5.1 National Registration in Two World Wars: The 1918 Committee on Registration 


Thinking in Britain about population registers goes back over seventy years to the First World 
War. The National Registration Act of 1915 had obliged every adult to carry a National 
Registration Certificate and to register every change of address. This led Sir Bernard Mallet, 
Registrar General, to consider a permanent system, which he outlined in his Presidential address 
to the Royal Statistical Society in November 1916 (Mallet 1917). But he was aware that he might 
be criticised for ‘‘desiring to Prussianise our institutions’’. 

These ideas were developed in the report of a committee appointed by the government in 1918 
and chaired by Sir Bernard Mallet. Many years later he reviewed the findings in his Presidential 
address to the Eugenics Society (Mallet 1929). What he then said remains true today: 
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‘‘We found in existence in England a very considerable number of registers being kept at 
considerable expense for various special purposes, some of them covering very large sections 
of the population. These registers are kept under different Acts of Parliament, by various 
authorities, in varying areas, for independent purposes, without any provision for their coor- 
dination one with another’’. 

The committee proposed continuous registers of the population kept locally and associated 
with identity cards. A central index register would interrelate the local registers to deal with 
removals and to prevent duplicate entries. This registration system would coordinate the 
registers kept for special purposes — electoral registers, school attendance registers, the decen- 
nial census, registers of births, marriages and deaths, etc. It is noteworthy that the committee, 
reporting nearly seventy years ago, proposed that the census of population should be linked 
to population registration. 

In his 1929 address Sir Bernard Mallet set out the principles to which any good system should 
conform: first, the accurate identification of every individual ‘‘in order (a) that he shall be made 
responsible for the fulfilment of his obligations to the community and (b) that he shall be 
ensured his rights as a citizen, whether these take the form of franchises to be exercised or dues 
to be received’’; second, the acquisition of statistical information and in particular regular 
figures of the populations of local areas. The analysis made and the proposals that followed 
would still stand as a valid response to the situation that we face in the United Kingdom today, 
though some of the features would not be acceptable now. Thus: 


“‘the numerous official enquiries and registers, now made and maintained independently 
of each other, would be coordinated into a single system which would provide a dossier 
for each individual containing those particulars regarding him which the State is con- 
cerned to know’”’ (Mallet 1929). 


To Sir Bernard Mallet’s regret the recommendations in his committee’s report were not car- 
ried out and, with the demise of the temporary wartime legislation, national registration ceased 
until the outbreak of the Second World War. 

During the Second World War and for a few years after a full system of population registra- 
tion operated in Britain. A National Register was set up linked to the issue to each person of 
an identity card bearing his identity number and address. Local registers were coordinated 
through a central register which held each person’s name, date of birth, identity number and 
a code for area of residence. A person had to notify changes of address to the local register. 
The National Register survived until 1952 when identity cards and the obligation to notify 
changes of address were abandoned in a post-war spirit of ‘‘set the people free’’. 


5.2 The National Health Service Central Register 


The central register set up in 1939 during National Registration has been maintained since 
1952 to serve a more limited role in the running of the National Health Service (NHS). Renamed 
the National Health Service Central Register (NHSCR), it now includes everyone resident in 
Britain apart from the 1 or 2 per cent who were born abroad and who have never registered 
on the patient list of a doctor in the NHS. But the NHSCR does not fill the role of a central 
population register of the kind found in many countries in Northern Europe because it is not 
used as a reference point from which other agencies can check personal identities and can carry 
the personal reference numbers into their own files. Indeed the identity numbers recorded in 
the NHSCR serve only NHS purposes. Other limitations which would inhibit the wider use 
of the NHSCR are: 
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(1) A significant proportion of the data arriving at the NHSCR do not carry the identity 
number and, given the difficulty in using names and dates of birth as unique identifiers, 
some of these data cannot be linked to already existing NHSCR records; thus some 1 
or 2 per cent of the deaths notified to NHSCR cannot be linked in. This and the failure 
to remove all emigrants from the register are main factors in the inflation in the reg- 
ister, currently estimated at about 5 per cent. But this figure should reduce shortly when 
the register is computerised. 


(2) Addresses are held in full in local registers and as area codes in the NHSCR. But in most 
cases changes of address are recorded only when a person registers with a new doctor 
— which may occur years after the person has moved house. 


5.3 The Wide Range of Registers in the United Kingdom 


As in any other developed country, a wide range of registers containing personal data is 
held by public authorities in the United Kingdom. The main ones concern vital registration 
(births, deaths, marriages and divorces), immigration and naturalization, the national health 
service, social security (contributors and beneficiaries such as the unemployed, pensioners and 
children), personal taxation, passports, electoral lists, the ownership of cars and licenses to 
drive cars. But these registers are maintained independently of one another by the different 
agencies, each with its own personal numbering system. (An exception is the joint arrangements 
for collecting employees’ social security contributions and income tax under Pay-As-You-Earn, 
using one set of personal numbers, the National Insurance numbers.) This case apart, there 
is no coordination of record systems, no consistency in the content of records and no single 
set of personal numbers in general use. Details of a person’s identity, usually name and date 
of birth, may differ between one register and another or even within the same register. This 
causes duplication and makes linking between registers for statistical purposes uncertain and 
costly. Information on address is even less consistent. There is no mechanism for carrying 
updating information simultaneously into all relevant records, for example information on 
change of address, change of name on marriage, or even the fact of death. In the words of 
Sir John Boreham, then head of the Government Statistical Service (GSS), ‘‘the information 
is never properly brought together ... It’s all rather ramshackle’’ (Boreham 1985). 


5.4 The 1960s Study of Registers 


The existing uncoordinated system of records is inefficient for administration; and the 
absence of up-to-date addresses and the inability to link records are severe handicaps for 
statistics. And so in the late 1960s the GSS looked for a remedy. It studied the case for replacing 
the variety of personal numbering systems by a single set of personal numbers to be held in 
a central register, which might also include up-to-date addresses (Penrice e¢ a/. 1968). But 
Ministers decided that these ideas were politically unacceptable and terminated the studies 
(House of Lords 1969). 


5.5 The Registers for the New Community Charge 


It would seem that one of the biggest obstacles to the creation of a population register in 
Britain has now been overcome: an obligation has been laid on the citizen to report changes 
of address. Despite this, no effective population register will be created. The government has 
set its face against that. 
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The new obligation to report changes of address — a revolutionary departure from peacetime 
traditions in Britain — stems from the government’s decision to change the basis of local tax- 
ation. In the past local taxes have been levied on the occupiers of property on the basis of the 
property’s rental value. The tax on the occupier of a dwelling is now to be replaced by a flat 
rate tax on each person aged 18 and over living in the dwelling: the Community Charge (CC). 
To administer the tax new local registers will be maintained listing addresses and the persons 
aged 18 and over resident there. Though the registration officer will be able to make enquiries 
and to call on information held by local authorities and housing bodies and in electoral rolls, 
the obligation to inform him of changes to the register is laid on the individual. Legislation 
has been enacted to introduce the new system in Scotland with effect from April 1989 and in 
England and Wales from April 1990. 

But the CC registers will be primitive instruments compared to the population registers in 
the Nordic and Benelux countries because: 


(1) The CC registers will not cover everyone; in particular they will not cover the under-18s 
and people living in boarding houses and institutions. 


(2) The registers (which will record each person’s name, address and, in Scotland only, date 
of birth) will be maintained locally with a limited degree of standardisation of procedures. 
There will be no central register to standardise the description of each person’s identity 
and to coordinate the local registers (for example to facilitate transfers between 
authorities). 


(3) Although the legislation makes no specific provision for including a personal reference 
number in the registers, a report had recommended that local authorities in Scotland 
should create such a number and suggested a possible algorithm for this based on name 
and date of birth (Chartered Institute of Public Finance and Accountancy 1987). But 
the recommendation is not being implemented. 


(4) The legislation specifies who can have access to which parts of the register. Apart from 
local authority access for the purpose of administering the CC: an individual can inspect 
the entry relating to himself; the public can inspect the list of addresses and the names 
of persons relating to each address (but, to quote the Scottish legislation, ‘‘not so as to 
ascertain whether that person resides at that address’’); and the Electoral Registration 
Officer has access for his purposes. No other access is permitted. 


The government’s rejection of a population register that would coordinate administrative 
records is spelt out in the Green Paper on the CC scheme (Her Majesty’s Government 1986). 
The paper cites countries that ‘‘have unified their separate registers and use them for several 
different central administrative purposes’’. It goes on ‘‘The British tradition is different. 
Registers are kept separately for different purposes by the body which needs them for a par- 
ticular purpose. ... There will be no national register.’’ This contrast between other coun- 
tries’ practices and United Kingdom practice is mistaken, because in other countries the different 
agencies maintain separate registers but call on a central register in order to identify the 
individuals that they are dealing with. I would judge that the statement ‘‘There will be no 
national register’’ reflects a political axiom, not the conclusion of rational analysis. 

The creation of the CC registers is perhaps a missed opportunity to set up an effective popula- 
tion register. But the CC scheme is not an ideal vehicle for that. If it is to be effective, popula- 
tion registration should serve many ends, the more the better, and not just one — particularly 
when the single purpose is to levy a tax which many will feel onerous and many may try to 
avoid. Moreover the CC is politically controversial because of its differential impact on various 
groups in the community: in general terms a transfer of resources from the poor to the rich. 
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Thus there are several reasons for questioning the operational effectiveness of the registers 
to be set up under the CC scheme: the single purpose and controversial aim of the registers; 
the incomplete coverage of the population (the omission of some groups); the lack of a cen- 
tral register to coordinate the local registers; and the reliance on a person’s name and (in 
Scotland only) date of birth as identifiers rather than a permanent personal number. The local 
authorities have made some critical observations on the problems that they will face in attemp- 
ting to set up the registers (Rating and Valuation Association 1987). It looks as though the 
government has embarked on new tax legislation without thinking through the practicalities 
of implementation. 

Another worrying feature of the CC scheme is its effect on response to the 1991 census of 
population. Many of those who evade CC will probably try to evade the census too, not trusting 
the census authorities’ assurances that census data will not be passed on to other agencies. And 
if the census form is too explicit by stating ‘“YOUR INFORMATION WILL NOT BE PASSED 
TO THE AGENCIES DEALING WITH TAX, SOCIAL SECURITY, COMMUNITY 
CHARGES, ...’’, will the census authorities themselves be seen to be condoning or even 
encouraging evasion and fraud? 


5.6 The United Kingdom Environment 


Leaving aside the CC, the present environment in the United Kingdom is generally hostile 
to the idea of population registers. But two positive features may be mentioned. First, the Data 
Protection Act, 1984 introduced safeguards for personal data held on computers on the lines 
of the Council of Europe’s Convention of 1981 (Council of Europe 1981). In fact the govern- 
ment’s primary aim in introducing the 1984 legislation was commercial: to establish the United 
Kingdom as a safe place in the eyes of other countries which might be considering transmit- 
ting their data to the United Kingdom for processing. Protection of privacy was a lesser aim. 
Second the GSS, which would be concerned with some aspects of the working of population 
registers, has established an unquestioned record of protecting data; it has published a code 
of practice (Government Statistical Service 1984). Integrity in handling data has been under- 
pinned by the fact that the GSS is decentralised, so that legal and administrative barriers have 
prevented the exchange of data even for statistical purposes. Such barriers would have to be 
removed if the statistical fruits of population registration were to be secured. 

On the other side of the balance sheet the GSS’s dependence on central government con- 
trasts with the relative autonomy of the statistical organisations in, for example, Denmark and 
the Netherlands; this could lessen public confidence in its handling of data. The GSS’s image 
as a creature of central government has been intensified by the Rayner Reviews of the early 
1980s, as a result of which the GSS was instructed to give greater priority to the needs of cen- 
tral government at the expense of the needs of others — the local authorities, business, 
academics and the general public. 

A main obstacle to population registers in the United Kingdom is the public’s traditional 
resistance to governmental actions that appear to be overbearing or bureaucratic. The privacy 
lobby can be relied on to lead the opposition to any new reporting obligations placed on the 
public, to any extensions of the government’s holding of personal data or to any project for 
linking data. The opposition overlooks the costs and injustices that result from inefficient 
management of data; and it overlooks or undervalues the checks on the misuse of personal 
data that can be provided by legislation on data protection and freedom of information — 
if properly implemented. In recent years fears about giving more personal data to the govern- 
ment have been reinforced by the public’s perception of the style of government: the United 
Kingdom government is seen as almost obsessively secret and as seeking to concentrate power 
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in its own hands. Thus, not only is there no Freedom of Information legislation in the United 
Kingdom, but all government information has, in principle, been protected by the catch-all 
Official Secrets Act, 1911 (Superseded in May 1989 by a more narrowly worded Act). Peter 
Hennessy, editor of Contemporary Record, asserts that British governments ‘‘maintain the 
tightest system of administrative secrecy in the western world’’ (Hennessy 1987). And recent 
events have called into question the proper accountability of the security services. Writing of 
the whole range of government activity, William Plowden, Director General of the Royal 
Institute of Public Administration, said ‘‘a modern British government, supported by an ade- 
quate majority in the House of Commons, at little risk from the rubber-toothed bulldogs of 
the select committees and entrenched behind the Official Secrets Act, is one of the least accoun- 
table executives in the developed world’’ (Plowden 1987). 

So the public is suspicious of any new scheme of population registration. And, as already 
noted, opposition to full registration has been expressed by the present administration, which, 
like its counterpart in the United States, has made determined efforts to ‘‘get government off 
our backs’’. One of the administration’s major policy objectives has been to reduce the size 
and influence of the public sector — sometimes giving a higher priority to this than to 
cost-effectiveness. So public concern about privacy, political ideology and scarce resources 
combine to block a full register which could lead to substantial savings and to a fairer and more 
just society. In fact there has been no balanced presentation of all the issues, and so no public 
discussion of them, in the past half century. 


6. AN AUSTRALIAN INITIATIVE: IDENTITY CARDS 


I know little about the Australian temperament or the Australian political scene, but I guess 
that resistance to bureaucratic government is as strong there as it is in the United Kingdom. 
Even so, the Australian government introduced a Bill to issue each citizen with an identity card 
— the Australia Card (AC). The reasons were wholly administrative: to reduce tax evasion, 
to reduce social security fraud and to reduce illegal immigration. The AC would carry the 
person’s name, his photograph, his signature and an AC number (personal reference number) 
but not address. It would be backed up by an AC register (which would also include address 
and date of birth) accessible only to certain government departments. 

The Australia Card Bill, 1986 was passed by the House of Representatives but was rejected 
by the Senate (in which the government party did not have a maj ority). The rejection was given 
as one of the reasons for calling the July 1987 general election and, following the electoral 
success of the government party, the Bill was due to come before Parliament again. But the 
Bill was withdrawn because of a serious legal flaw. However it is worth describing the Bill’s 
provisions. 

The AC register would be a central population register. But it would be less developed than 
those in Northern Europe for two main reasons: 


(1) The Bill did not place an obligation on the citizen to notify each change of address. The 
hope was, I understand, that most changes of address would be picked up by one or other 
of the government agencies taking part in the scheme and would then be passed on to 
the AC register. 


(2) The AC scheme would not be as multi-purpose as several of the population registers in 
Europe. As a result of concerns about privacy and uncontrolled linking of data, the AC 
register would be accessible only to the government agencies dealing with tax, social 
security and health insurance, and then only to check identities. 
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The Bill defined the situations in which a person could be required to produce his AC; these 
included making any of a wide range of financial transactions, entering a new employment, 
claiming Medicare or social security benefits, and receiving hospital treatment. It would be 
illegal to require a person to produce his AC in any other situation. 

As a further safeguard on privacy the Bill provided for a Data Protection Agency. How- 
ever the government argued that privacy had to be balanced against the losses to government 
funds through tax evasion and fraud. The government estimated that the costs to government 
of the AC scheme would $0.8 billion over ten years, but that this would be offset many times 
over by savings of $4.1 billion in tax and $1.4 billion in social security, giving a net saving over 
the ten years of $4.7 billion (Australian House of Representatives 1986). 

Remarks made by the Minister of Health in Parliament (Australian House of Representative 
1986) show what Ministers were trying to achieve and the clear political commitment: 


“I bring before Parliament today. .. a long overdue reform to provide fairness and equity 
for all Australians.’’ 


““No one doubts that the Australia Card will check tax evasion; no one doubts that 
it will contribute to the integrity of our social security system; no one doubts that 
it will be a useful weapon in deterring illegal immigration; no one doubts that by 
facilitating the pursuit of the money trail it will provide an invaluable instrument 
against corporate and organised crime.”’ 


“‘Irrefutably, citizens need to be protected against abuse of their privacy by govern- 
ment. But equally citizens need to be protected against others who cynically hide 
behind the mantle of privacy to create false identities and thus defraud the com- 
munity.”’ 


“It is inevitable that this country will establish an identification system before the 
century is out.”’ 


Though the AC Bill has now been withdrawn, the government is searching for other ways 
to clamp down on tax and social security fraud, and so the story is not yet ended. 


6.1 Identity Cards 


The main emphasis in the Australian scheme was placed on the identity card as a way of 
checking identity, rather than on the personal number and register. Some European systems 
also combine the issue of identity cards with population registration; the Belgian system is one 
of the most highly developed. And undoubtedly the identity card provides an extra tier of 
security — provided it is not forged or stolen. In some countries identity cards are unconnected 
with population registration, for example in France. 

In countries unaccustomed to identity cards in peacetime, the card is seen as a symbol of 
an authoritarian régime and an affront to civil liberties. That may be one of the reasons why 
the AC scheme generated so much public opposition in Australia. But much of the benefit from 
population registers can be secured without identity cards provided that citizens know their 
personal numbers and quote them in dealings with public authorities. This is what happens 
in Denmark and Sweden where population registration is effective, both administratively and 
statistically, without issuing identity cards to everyone. 

A country like the United Kingdom ought not to shy away from correcting the incoherence 
of its records just because the uninformed critic might equate the necessary remedy — popula- 
tion registration — with what is only an optional extra — identity cards. 
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7. CONCLUDING REMARKS 


Setting up a population register, with up-to-date addresses and personal reference numbers 
that are also carried into administrative files, would in fact be little more than bringing order 
into an existing ‘‘ramshackle’’ system: even in the most ramshackle system the citizen has to 
identify himself and inform various agencies of a change of address. Nonetheless some people 
are deeply worried by the prospect of a population register because of its threat to privacy and 
freedom and because it gives increased power to the State with all the dangers of misuse by 
an authoritarian or oppressive government. But specific remedies can and should be put in 
place: an effective data protection régime and legislation on freedom of information. 

On the other hand a properly coordinated record system would have political advantages 
that have been largely overlooked. At the top of the list I would put two things: 


(1) A brake on fraud, crime and illegal immigration. 


(2) A fairer society, so that burdens and duties are fairly shared and benefits and rights go 
only to those entitled to them. Put another way, freedom should not extend to the 
freedom to cheat the rest of the community. 

Rather lower down the list I would put: 


(3) The financial savings to government. More accurate records will cut the costs of admin- 
istration, give a higher yield of tax and reduce the amount of benefits paid improperly 
— illustrated by the Australian figures (Section 6). 


(4) A wider range of policy options for government. Thus, if a reliable population register 
were already in place in the United Kingdom, the government would not have to con- 
struct a register ad hoc in order to launch its Community Charge scheme; and it could 
regulate immigration through control on residence in addition to the controls at airports 
and seaports. 


(5) Other benefits from more reliable checks on identity. The late Registrar General gives 
as an example better checks on a couple’s eligibility to marry. There would also be fewer 
different reference numbers to be quoted and perhaps fewer plastic cards to be carried. 


(6) Better statistics (but see a qualification below). 


This list is one answer to the charge that a population register is totalitarian and Big Brother. 
Without safeguards and in the wrong hands it could be. But it could also be the key to a fair 
and just society. The question is: what kind of society do we seek? Is it one that encourages, 
or at least turns a blind eye to, fraud, tax evasion and crime? Australian Ministers cite the man 
who was convicted for collecting over 50 separate unemployment benefit cheques each fortnight 
(Australian House of Representatives 1986). In the United Kingdom a Member of Parliament 
and barrister was convicted in 1987 for making multiple applications for shares against the 
rules by using different names, addresses and bank accounts; the defence was that it was 
common practice. 

Another answer to the charge of totalitarianism is to look at the population registers in other 
countries. Table 1 divides 15 countries — all the countries of Western Europe except Austria 
and Switzerland — into four groups according to the kind of register system that each has. The 
six countries in group A have the most effective systems: their administrative records are coordin- 
ated by the population registers. The four countries in group Bare in an intermediate position. 
In the three countries in group C population registers exist only at the local level and their quality 
is sometimes poor. Finally Ireland and the United Kingdom are in group D at the least devel- 
oped end of the spectrum. If the United Kingdom were to take what I believe is a rational and 
realistic course and move into group A, it would not be joining a totalitarian company. 
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Table 1 
Particular Features of Population Registration in 15 Countries! 
A Central 
Local Population Personal 
Population Register which Reference 
Registers Coordinates Numbers 
Administrative 
Records 
A. With a Full System 
of Population Registration 
Belgium x x xX 
Denmark x x x 
Finland Xx x re 
Luxembourg x x x 
Norway x x x 
Sweden x Xi x 
B. Intermediate Group 
France ° x x 
Netherlands x ° x 
Portugal e x x 
Spain K x X 
C. With Local Population 
registers only 
F. R. of Germany x 
Greece xX 
Italy x e e 
D. Without Population Registers 
Ireland ° e e 
United Kingdom e ° e 
Number of Countries 
with the Feature 11 8+ 10 


1 For details see Redfern 1987. 


The statement noted earlier (item 6) that a properly coordinated record system will lead to 
better statistics needs to be qualified. Better statistics are indeed the direct consequence; a good 
example is regular and reliable population statistics for small areas. But if, as an indirect 
consequence, irresistable pressure builds up to replace a conventional census by a wholly 
register-based census, there are both benefits and penalties. Against the benefits of lower costs, 
a smaller burden on the public and a lesser risk of sabotage has to be set the probable deteriora- 
tion in the range and quality of census results on economic topics, housing etc. Thus 
administrative records may increasingly fail to reflect the complexities and informalities of 
present-day life-styles which a conventional census could attempt to record — for example more 
part-time employment and self-employment, more second homes and looser family and 
household ties. It is here that Nordic experience (Section 3) is relevant. 

Statisticians are not likely to underestimate the value of better statistics. But policy and 
administration — political considerations — carry a bigger weight in the arguments for and 
against population registers. The arguments need therefore to be debated by policy-makers, 
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politicians and the public. In the United Kingdom a debate ought to take place on the wisdom 
— indeed the feasibility — of constructing the single-purpose CC population register 
deliberately disconnected from other registers, rather than a multi-purpose population reg- 
ister with all the benefits that that could bring. 

But I believe it right to bring the subject before statisticians for three reasons. First statisti- 
cians understand both the technical problems and the wider issues, and so can give alead. Thus, 
in the United Kingdom both the earlier initiatives on population registers were taken in a 
statistical-cum-registration context (Section 5). Second, statistical agencies may be given respon- 
sibility for the key coordinating mechanisms, in particular the central population register, as 
INSEE has in France and SSB in Norway. Third, statisticians would benefit from more reliable 
data. 

I hope therefore that statisticians will make their views known. Registers are very mucha 
live issue, not least in such ‘“‘under-developed’’ countries as the United Kingdom and Australia. 
Statisticians working in government service should reflect on the comment on professional ethics 
offered to the US Bureau of the Census; the words were written in a different context by the 
1984 Panel on Decennial Census Methodology (Citro and Cohen 1985) but are very relevant 
here: 


“*We recognise that the temper of the times is not conducive to the initiation of new pro- 
grams, but we believe that statisticians have the responsibility to describe the facts and 
recommend the actions they believe are sensible.”’ 
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Methods for Adjusting for Lack of Independence 
in an Application of the Fellegi-Sunter Model 
of Record Linkage 


WILLIAM E. WINKLER! 


ABSTRACT 


Let A x B be the product space of two sets A and B which is divided into matches (pairs representing 
the same entity) and nonmatches (pairs representing different entities). Linkage rules are those that divide 
A x Binto links (designated matches), possible links (pairs for which we delay a decision), and nonlinks 
(designated nonmatches). Under fixed bounds on the error rates, Fellegi and Sunter (1969) provided a 
linkage rule that is optimal in the sense that it minimizes the set of possible links. The optimality is depen- 
dent on knowledge of certain probabilities that are used in a crucial likelihood ratio. In applying the record 
linkage model, an independence assumption is often made that allows estimation of the probabilities. 
If the assumption is not met, then a record linkage procedure using estimates computed under the assump- 
tion may not be optimal. This paper contains an examination of methods for adjusting linkage rules when 
the independence assumption is not valid. The presentation takes the form of an empirical analysis of 
lists of businesses for which the truth of matches is known. The number of possible links obtained using 
standard and adjusted computational procedures may be dependent on different samples. Bootstrap 
methods (Efron 1987) are used to examine the variation due to different samples. 


KEY WORDS: Decision rule; Error rate; Steepest ascent; Bootstrap; Capture-recapture. 


1. INTRODUCTION 


This paper presents an analysis of decision rules obtained by applying the Fellegi-Sunter 
model of record linkage to lists of businesses. The analysis compares a rule obtained under 
an independence assumption that is typically assumed in practice with rules that include methods 
for adjusting for the failure of the independence assumption. 

Given two lists, we wish to use identifying information to delineate those record pairs that 
represent the same entities (matches) and those that are different (nonmatches). Thus, we desire 
to define a linkage rule that allows us to divide the cross-product space of pairs into links 
(designated matches), possible links (pairs for which a decision is delayed), and nonlinks 
(designated nonmatches). 

Under fixed bounds on the numbers of erroneous matches and nonmatches, Fellegi and 
Sunter (1969, Theorem) provide a procedure that, in theory, minimizes the number of possible 
links. The optimality is dependent on knowledge of certain probabilities that are used in a crucial 
likelihood ratio. 

In typical applications, an independence assumption is made that allows estimation of the 
probabilities used in the likelihood ratio. The probabilities are called matching parameters. 
If the independence assumption is not valid (Winkler 1985c; Kelley 1986) then linkage rules 
based on the estimated probabilities may not be optimal. 


! William E. Winkler, Statistical Research Division, U.S. Bureau of the Census, Washington, DC 20233, USA. 
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Given fixed bounds on error rates, better linkage rules will be those that reduce the set of 
possible links. If a rule is based on matching parameters that are estimated under an invalid 
independence assumption, then it may be possible to develop adjustment procedures to deter- 
mine better rules. To test whether one rule is statistically better than another, we use Efron’s 
bootstrap (1987; also Hall 1988). 

The remainder of the paper presents background, methods, and results from applying several 
record linkage rules to lists of businesses. The application involves pairs of lists for which the 
truth and falsehood of linkages are known. 

The second section of this paper is divided into four subsections. The first contains a descrip- 
tion of the data base and the specific subfields that are compared. The second subsection con- 
tains a summary of the Fellegi-Sunter model. The third subsection highlights common 
assumptions made and computational procedures used. It also contains details of computa- 
tional procedures that are specific to the application of this paper. 

The fourth subsection describes the evaluation procedures. The basic evaluation technique 
involves comparing sizes of the regions of possible links when different types of linkage rules 
are applied under fixed error bounds. The sizes of the regions of possible links are statistics 
that may be dependent on the samples used in calibrating the linkage rules. Efron’s bootstrap 
(1987, 1982, 1979; also Hall 1988) is used to evaluate their distributions. 

Results are presented in the third section. This is followed in the fourth section by discus- 
sion of the robustness of weight adjustment procedures, the type of conditioning represented 
by the adjusted weights, additional types of comparisons, and the use of extra blocking criteria. 
Finally, the paper concludes with a summary. 


2. DATA BASE, LINKAGE MODEL, COMPUTATIONAL AND 
EVALUATION PROCEDURES 


2.1 Data Base 


The description of the data base is divided into two components. The first component is 
a description of the overall properties. The second contains a listing of the specific subfield 
comparisons that are made. 


2.1.1 Overall Description 


The data base of 57,900 records contains 54,850 records that are identified as individual 
companies and 3,050 duplicates. A pair of records that consists of a company and its correspon- 
ding duplicate is a match; all others are nonmatches. 

The data base was constructed from 11 Energy Information Administration (EIA) and 47 
State and industry lists containing 176,000 records. Duplicates were identified via elementary 
techniques, through call-backs (phone numbers are sometimes present) and through surveying. 

The decision rules that are developed are only applied to those pairs that generally repre- 
sent hard-to-identify duplicates. Easy-to-identify duplicates are those pairs having substan- 
tial portions of their name and addresses agreeing on a character-by-character basis. 

An example of a hard-to-identify duplicate might be: 


NAME STREET hs STATE ZIP 


Zabrinsky Fuel 16 W Sycamore St Dayton OH BRRI DP) 
Zabrinky Cmpny 167 Sycamere St Springfield OH 53315. 
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We observe that both ‘Zabrinsky’ and ‘Sycamore’ are spelled wrong in the second record, that 
‘Cmpny’ is a nonstandard abbreviation, and that Springfield OH, a suburb of Dayton, has 
Postal ZIP code 53315. 


2.1.2 Specific Subfields Compared 


There are four sets of specific subfields that are compared in each pair of records. First are 
those that can be obtained through easy substring comparisons. For instance, we could com- 
pare character positions 1-4 of the NAME field from one record with the corresponding same 
character positions of the NAME field in another record. 

In Table 1 WL-NAME is obtained by sorting the NAME field by words of decreasing length 
with ties broken by an alpha sort. Corresponding subfields are then compared on a character- 
by-character basis. 

The second set is the four comparisons of the first and second largest words in the NAME 
field. Ties are again broken by an alpha sort. 

The last two sets are of subsets of the STREET and NAME fields that are designated by 
highly sophisticated software. ZIPSTAN software from the Census Bureau (U.S. Dept. of 
Commerce 1978b) is used to obtain corresponding subfields of the STREET field. The sub- 
fields are: House No., Prefixes 1 and 2, Street Name, Suffixes 1 and 2, and Unit. Prefixes are 
directions such as East and North. Suffixes are words such as Street and Road. Unit designates 
identifiers such as apartment or suite number. 

The NSKGENS module from software used in the Canadian Business Register (Statistics 
Canada 1984, 1982) is used to obtain corresponding subfields of the NAME field. NSKGENS5 
creates three groups of words. The first group consists of three abbreviations with the first 
corresponding to surname if present. The second group contains two words with the first cor- 
responding to surname. The third group is a single word obtained by concatenating and 
abbreviating individual words in the NAME field. Details are given in Winkler (1987) or in 
Statistics Canada (1984, 1982). 


2.2 Fellegi-Sunter Model 


The Fellegi-Sunter Model uses a decision-theoretic approach establishing the validity of prin- 
ciples first used in practice by Newcombe (Newcombe et al. 1959). To give an overview, we 
describe the model in terms of ordered pairs in a product space. The description closely follows 
Fellegi and Sunter (1969, pp. 1184-1187). 


Table 1 


Corresponding Subfields Compared on a 
Character-by-Character Basis 


Field 1-4, 5-10, 11-20, 21-30 
NAME 1-4, 5-10, 11-20, 21-30 
STREET 1-6, 7-15, 16-30 

ZIP 1-3, 4-5 

Cry 1-5, 6-10, 11-15 
STATE 1-2 

TELEPHONE 1-3, 4-6, 7-10 


WL-NAME 1-4, 5-10, 11-20, 21-30 
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There are two populations A and B whose elements will be denoted by a and b. We assume 
that some elements are common to A and B. Consequently the set of ordered pairs 


A X B = {(a,b): a€A, bEB} 
is the union of two disjoint sets of matches 
M =.,}.(a,D): @.=_D, QGA, DEB) 


and nonmatches 


Ch =U (a.0). 2. =U. GEA. DEB. 


The records corresponding to members of A and B are denoted by a(a) and 6(b), respec- 
tively. The comparison vector y associated with the records is defined by: 


yla(a), 8(b)] = (y'[a(a), B(b)], y7[a(a), B(b)], ..., y*La(a), B(b) 1}. 


Each of the y', i = 1, ..., K, represents a specific comparison. For instance, y! could rep- 
resent agreement/disagreement on sex. y” could represent the comparison that two surnames 
agree and take a specific value or that they disagree. 

Where confusion does not arise, the functionyon A xX B will be denoted by y(a,8), y(a@,b), 
or y. The set of all possible realizations of y is denoted by I. 

The conditional probability of y(a,b) if (a,b)€M is given by 


m(y) = Pfly[a(a)B(b)]| (a,b)eM} 


yy Plrle(a), 6(b)1} - Pl (a,b)|M). 


(a,b)EM 


Similarly we denote the conditional probability of y if (a,b)€U by u(y). 

We observe a vector of information y (a,b) associated with pair (a,b) and wish to designate 
a pair as a link (denote the decision by A), a possible link (decision A,), or a nonlink (deci- 
sion A;). A linkage rule Z is defined a mapping from I’, the comparison space, onto a set of 
random decision functions D = {d(v)} where 


d(y) = {P(A,|y), P(A2| 7), P(A3ly) 3 yer 


and 


33 
y PtAly) = 1. 
i=] 


There are two types of error associated with a linkage rule. A Type I error occurs if an unmat- 
ched comparison is erroneously linked. It has probability 


P(A,|U) = ye u(y) + P(A,|y) 
yer 
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A Type II error occurs if a matched comparison is erroneously not linked. It has probability 


P(A3|U) = Y) m(y) - P(Asly)- 
yer 


Fellegi and Sunter (1969) define a linkage rule Lo, with associated decisions A,, A>, and 
A3, that is optimal in the following sense: 

THEOREM (Fellegi-Sunter 1969). Let L’ be a linkage rule with associated decisions Aj, 
Az, and Aj such that it has the same error probabilities P(A3|M) = P(A;3|M) and 
P(Aj|U) = P(A,| UV) as Lo. Then Ly is optimal in that P(A,| U) < P(A3|U) and P(A,| M) 
< P(A3|M). 

In other words, if L’ is any competitor of Lp having the same Type I and Type II error rates 
(which are both conditional probabilities), then the conditional probabilities (either on set U 
or M) of not making a decision under rule L’ are always greater than under Lo. Lp is described 
in subsection 2.3.1. 

The Fellegi-Sunter linkage rule is actually optimal with respect to any set Q of ordered pairs 
in A x B if we define error probabilities Pg and a linkage rule Lg conditional on Q. Thus, it 
may be possible to define subsets of A x Bon which we make use of differing amounts and 
types of available information. 

For instance, if we have a set of pairs in which telephone number is present, we might 
use telephone number and a few characters from the name to designate links. With other 
pairs, we may additionally have to utilize information from the street address and the 
city name. 

Sets of ordered pairs Q on which the Fellegi-Sunter linkage rule is applied are often 
obtained by blocking criteria. Blocking criteria are sort keys that are used to reduce the 
number of pairs that are considered. Rather than consider all pairsin A x B, we might only 
consider pairs that agree on the first three digits of the ZIP code or on a suitable abbreviation 
of surname. 


2.3. Computational Procedures 


This section is divided into five parts. The first part contains a description of the general 
linkage rule of the Fellegi-Sunter Model. The second contains a description of the simplified 
computational procedures when a conditional independence assumption is made. 

Background on the validity of the conditional independence assumption is presented in the 
third part. The fourth describes two general methods of adapting computational procedures. 
The fifth provides a description of the specific computational procedures of this paper. 


2.3.1 General Form of Linkage Rule 


To provide a background for understanding why specific computational procedures are used, 
we consider the following likelihood ratio 


R = R[y(a,b)] = m(y)/u(y). (2.1) 


We observe that, if y represents a comparison of K fields, then there are at least 2% pro- 
babilities of form m (7). If y represents agreements of K fields, we would expect this to occur 
more often for matches M than for nonmatches U. The ratio R would then be large. Alter- 
natively, if y consists of disagreements, the ratio R would be small. 
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If the numerator is positive and the denominator is zero in (2.1), we assign an arbitrary very 
large number to the ratio. The Fellegi-Sunter linkage rule takes the form: 


If R > UPPER, then denote (a,b) as a link. 
If LOWER < R < UPPER, then denote (a,b) as a possible link. (22) 
If R < LOWER, then denote (a,b) as a nonlink. 


The cutoffs LOWER and UPPER are determined by the desired error rate bounds. 


2.3.2 Simplification Under Conditional Independence Assumption 


In practice, computation is simplified two ways. The first is by the conditional indepen- 
dence assumption of Fellegi and Sunter (1969): 
For each y €T 


m(y) = m(y') - m(y7) ... mx(y*) and 
K 
) 


u(y) = u(y") + u(y?) ... UK(y 


where’ for? ,=«1, 2, 5.29 Kk 


m;(y') = P(y'| (a,b) €M) and 
uj(y') = P(y'| (a,b)€U). 


This assumption basically is that agreement on one characteristic such as surname does not 
depend on agreement of other characteristics such as house number or age. 

The second is to use a computationally convenient function of the ratio in (2.1). Log, is 
used. We then have 


S 
Tl 


W(y) = Log,[m(y)/u(y)] (2.3) 
Sy VER Eee 


where W! = Log,[m;(y')/u;(y')] for i = 1, 2, ..., K. We call W the total comparison 
weight associated with a pair and W’, i = 1, 2, ... K, the individual comparison weights. 
For the remainder of the paper we will assume that each component y', i = 1, 2, ... K, 
in y represents a two-state comparison (e.g., agree/disagree). For convenience, we denote agree- 
ment in the ith component by vb, i= 1, 2, ... K. Under the conditional independence 
assumption, for eachi = 1, 2, ... K, we need to estimate probabilities of the forms 


P( y = yi|M) and P(y = 73] U). (2.4) 


Using a set of pairs for which the truth and falsehood of matches are known, for each agree- 
ment y/,, i = 1, 2, ..., K, we divide the set into the four subsets determined by the 
agree/disagree and match/nonmatch statuses in (2.4) to perform the estimation. 

If no conditional independence assumption is made, we need to estimate 2: (2* —1) pro- 
babilities of form (2.1) and divide the set of pairs for which truth and falsehood are known 
to 2 - (2*—1) subsets. Even with a small number of comparisons (say, 6 or less), we may not 
be able to obtain sufficiently large samples to allow accurate estimation of the probabilities. 
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2.3.3. Validity of Conditional Independence Assumption 


Winkler (1985c) has shown that the independence assumption is not valid for simple com- 
parisons of portions of the name and street address fields for list of businesses. Using similar 
portions of the name and street fields, Kelley (1986) has shown that the independence assump- 
tion is not valid for files of individuals. Furthermore, Kelley and Winkler have each shown 
that matching efficacy is sensitive to the set of pairs over which probabilities of the form (2.4) 
are computed. 

Fellegi and Sunter indicate that, if the conditional independence assumption is not valid, 
then estimates of weights that are obtained via formula (2.3) will lose their strict probabilistic 
interpretation. By this, they mean that the linkage rule of their theorem may not actually min- 
imize the number of possible links. They indicate that they believe their procedure to be robust 
to departures from the independence assumption. 

Under the independence assumption, probabilities are computed as products of probabilities 
of the form (2.4). If we have a set of pairs for which truth and falsehood of matches are known, 
then we can adjust probabilities of form (2.4) for departures from the independence assump- 
tion. If the total weights obtained by adjustment yield substantially smaller sets of potential 
links under fixed bounds on error rates, then the Fellegi-Sunter procedure may not be robust 
to departures from independence. 


2.3.4 General Adjustments 


There are two general adjustments to the basic methods of computing individual compar- 
ison weights. The first consists of dividing the subset of pairs in A x B over which individ- 
ual comparison weights are computed into several subsets. The linkage rule is obtained by 
restricting the basic Fellegi-Sunter rule to correspond to the different subsets on which 
weights are computed. Individual comparison weights may vary significantly in different 
subsets. 

The second adjustment consists of modifying individual comparison weights. Under the 
independence assumption, we consider the equation 


W = Log, (P(yEB;}NBN ... NBx|M)/P(y€B, NBN ... NBx| U)) 
ee WS Wah Bon Spx. 


where, for i = 1, 2, and K, W’ = Log)(P(y€B;| M)/P(y€B;| U)) and B’ is the set 
{y' = 75} or its complement. We wish to find computationally tractable methods of 
adjusting the W’, i = 1, 2, ... , K, so that their sum yields better linkage rules. 

If there is a sample for which the truth and falsehood of matches are known, then we can 
estimate individual comparison weights (Tepping 1968) and the adjustments. 

The simplest adjustment procedure involves a steepest ascent approach (e.g., Cochran 
and Cox 1957). To begin, we use the known truth and falsehood of matches within a 
sample to estimate probabilities of the form (2.4). The probabilities are then used in computing 
individual comparison weights that are added to obtain an estimate of total weight (2.3). 
For each pair of fixed bounds on Type I and Type II errors, the cutoffs UPPER and 
LOWER of (2.2) can be determined. The number of potential links for rules of the form (2.2) 
follows immediately. 

Next, we chose an individual comparison weight, change it by a fixed amount (say + 1), 
recompute the total weight (2.3) using the new individual weight, and find new cutoffs UPPER 
and LOWER and a new region of potential links. 
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If under fixed bounds of errors, the size of the region of possible links decreases, then we 
continue adjusting the individual comparison weight (either up or down) until the region ceases 
its decrease in size. We continue by varying other individual weights in a similar manner. 

If the size of the region of possible links decreases substantially, then we know the condi- 
tional independence assumption is not valid for the set of comparisons. If the conditional inde- 
pendence assumption were valid, then the estimated weights would accurately represent the 
true weights. The regions of possible links would be minimal by the theorem of Fellegi and 
Sunter. 

A linkage rule that is based on adjusted individual comparison weights depends on the sample 
used in the steepest ascent procedure. 


2.3.5 Specific Methods 


To describe the specific methods of computing weights and obtaining corresponding linkage 
rules used in this paper, we need some additional background. 

The only pairs considered are those that agree on at least one of the blocking criteria in 
Table 2. 

We subdivide the set of pairs obtained via the four sets of blocking criteria into the five classes 
given in Table 3. 


Table 2 


Blocking Criteria 


# Characters Used 


1. 3 digits ZIP, 4 characters NAME 

2. 5 digits ZIP, 6 characters STREET 

3. 10 digits TELEPHONE 

4.* Word length sort NAME field, then use 1. 


* This criterion also has a deletion stage which prevents matching on commonly 
occurring words such as ‘OIL’, ‘FUEL’, ‘CORP’, and ‘DISTRIBUTOR.’ 


Table 3 
Sets of Pairs Determined by Blocking Criteria 


Class # pairs Determining Blocking Criteria 

1 1021. Agreeing on criterion 1 and no other or simul- 
taneously agreeing on criteria 1 and 4 and no others. 

iD 624 Agreeing on criterion 2 and no other or simul- 
taneously agreeing on criteria 2 and 3 and no others. 

3 256 Agreeing on criterion 3 only. 

4 344 Agreeing on criterion 4 only. 

5 2240 Agreeing on at least one criterion but not in classes 


1-4. 
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Class 5 contains pairs that generally agree on two or more blocking criteria. Classes 1-5 
contain 2991 matches and 1494 nonmatches and miss 59 known matches. The determination 
of sets of blocking criteria and classes is treated in detail in Winkler (1985b, 1987). 

We classify linkage rules by the different ways in which the individual comparison weights 
are computed and how resultant linkage rules are defined. 

The first type, AA, of weight computation is an overall aggregate in all pairs. The second, 
A, is an overall aggregate in classes 1-4. The third, U, yields separate weight computations 
in classes 1-4. The fourth, C, uses steepest ascent to adjust the individual weight computation 
of Type U. 

Each successive type of linkage rule involves increasingly more complex weight computa- 
tions. Matches outside classes 1-5 are not considered in the results section because their number 
is constant for each of the four linkage rules. 


2.4 Evaluation Procedures 


The basic evaluation technique involves comparing sizes of the region of possible links when 
the different types of linkage rules are applied under fixed error bounds. 

Efron’s bootstrap (1987, 1982, 1979) is used to estimate confidence intervals for statistics 
such as the number of possible links. As these statistics are obtained under complicated rules, 
it seems unlikely that closed-form estimates can be determined. 

If there are sets of pairs for which the truth and falsehood of matches are known, then we 
can use Efron’s bootstrap to estimate the variation of parameters in the following fashion: 


1. Draw calibration samples of equal size with replacement. 

2. Estimate individual comparison weights of the form (2.4) using the known truth and 
falsehood in the sample and use them to estimate total weight via (2.3). 

3. Compute cutoffs LOWER and UPPER using each sample (in our application we allow 
at most 2 percent of the links to be nonmatches and 3 percent of the nonlinks to be matches). 

4. Using individual comparison weights from step 2, compute a total comparison weight for 
each pair in the entire selected set of pairs. Use cutoffs from step 2 to classify pairs as links, 
possible links, and nonlinks. 

5. Using estimates from individual samples, determine the means and variances of the cutoff 
weights, of the misclassification rates, and of the number of possible links. 


The bounds (2 and 3 percent, step 3) are used to try to assure that the corresponding classifica- 
tion error rates in the entire data base are less than 5 percent. 


Table 4 


Linkage Rules by Type of Weight Computation and 
Sets of Pairs to Which Applied 


Type Individual Weight Linkage Rule 
Computation 
AA Uniformly over all Over all pairs 
pairs in Classes 1-5 
A Uniformly over all Designate pairs in Class 5 Links, Apply Fellegi- 
pairs in Classes 1-4 Sunter Rule to remaining pairs in Classes 1-4 
Uy Uniformly in each Designate pairs in Class 5 Links, Apply Fellegi- 
Class 1-4 Sunter Rule individually in Classes 1-4 
€ Uniformly in each Same as U except modify weights using steepest 


Class 1-4 ascent procedure 
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Computations and adjustments must be performed consistently across calibration samples. 
Identical adjustment procedures must be used in obtaining individual adjusted weights, total 
weights, and cutoffs. If an individual weight is adjusted upward (step 2) by amount x or per- 
centage y with one sample, then the same adjustment must be used with other samples. 

As the underlying distributions may not be normal or may be biased and skewed, we can 
use new techniques of Efron (1982, 1987; also Hall 1988) to determine confidence intervals. 
Hall (1988) has shown the theoretical validity of the nonparametric bootstrap that includes 
an acceleration-constant type adjustment for skewness of a distribution. 


3. RESULTS 


The results in this section comprise three parts. The first part is an overall comparison from 
using the four different weighting methods described in section 2.3.5. The second part con- 
tains more details about the best two methods from the first part. The third part contains results 
from the bootstrap evaluation. 


3.1 Overall Comparison 


We place fixed upper bounds of 5 percent on the number of matches misclassified as nonmatches 
and 2 percent on the number of nonmatches misclassified as matches. As we are using discrete 
data, actual error rates will generally not equal their upper bounds (Table 5, columns 2 and 3). 

We see that, as the complexity of the application of the weighting methodology increases, 
the number of possible links (size of manual review region) decreases dramatically from 1512 
to 97. This indicates that the increasing complexity of the weight computations yields increas- 
ingly better decision rules. 

We see that the last two methods, which both involve computing individual comparison 
weights separately in classes 1-4, yield the smallest sets of possible links (695 and 97, respec- 
tively). 


3.2 Best Methods 


We consider the best two methods, linkage rules using weights of Type U and of Type C, 
in greater detail. Results from applying weights of Type U and Type C are presented in Tables 
6 and 7, respectively. In determining cutoff weights by class, we place rough upper bounds 
of 5 percent misclassified nonmatches and 2 percent misclassified matches in each class. The 
overall upper bound is maintained. 

Comparing columns 4 and 5 across tables 6 and 7, we that the corresponding numbers of 
misclassified matches and nonmatches are approximately the same. This is consistent with the 
bounding method. In every class, the linkage rule using Type C weights yields less possible links 
than the rule using Type U weights. 

The numbers of records classified as possible links are less in classes 1 and 4 (83 versus 55 
and 44 versus 0, respectively) and dramatically less in classes 2 and 3 (409 versus 0 and 159 
versus 42, respectively). 

One hundred percent of the pairs in classes 2 and 4 are classified by the procedure that uses 
Type C weights. 

Two variations distinguish the linkage rule based on type C weights from the rule based on 
type U weights. First, we vary agreement weights associated with the four subfields of the 
NAME after words have been sorted by decreasing length (Table 8). The only substantial varia- 
tions (greater than 2.5 on the log, scale) occur in Class 2. 
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Table 5 


Error Rates and Number of Possible Links 
from Applying Different Weighting Methods 


Possible 
Links 


1512 
1052 
695 
SH) 


Total 
Records 


1021 
624 
256 
344 


2245 


Total 
Records 


1021 
624 
256 
344 


ae cae Total Classed 
Weight isclassed as 
Type Non- Non- 
March Match Rater Match 
AA .047 .020 964 2009 
A .041 .015 952 2481 
U .050 .020 1083 2707 
Cc .033 .019 1441 2947 
Table 6 
Results from Using a Linkage Rule Based on Type U 
Weights for Delineating Matches and Nonmatches 
(5 Percent Overall Misclassification Rate) 
Cutoff Weights Misclassed Total Total 
Glass as Classed as Not 
Not Non Classed 
LOWER UPPER Match Match Match Match 
1 0.5 6.5 39 14 674 264 83 
2 -4.5 3.5 2 4 100 115 409 
5 -4.5 65) Zz 1 55 42 159 
4 25 11.5 11 Ps 254 46 44 
Totals 54 Ze 1083 467 695 
Table 7 
Results from Using a Linkage Rule Based on Type C 
Weights for Delineating Matches and Nonmatches 
(3 Percent Overall Misclassification Rate) 
Cutoff Weights Misclassed Total Total 
Clase as Classed as Not 
Non Non Classed 
LOWER UPPER Match Match Match Match 
1 4.5 eS 28 8 692 274 55 
p2 ZS 22> 5 3 379 245 0 
3 -0.5 4.5 5 6 104 110 42 
4 8.5 8.5 9 4 266 78 0 
1441 707 97 


2245 


Totals 47 21 
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Table 8 


Steepest Ascent Adjustment to Agreement Weights 
for Subfields Obtained by Wordlength Sort 


Subfield 
Class 
1 2 3 4 
1 — =f 
2 ++ 2 SF ab + 
3 Te 5 — == == 
4 . + — aF 
1*.> means deviation less than 1.0, ‘+’, ‘—’ mean _ deviation 


greater than 1.0 and less than 2.5, and ’+ +’ means deviation greater than 2.5. 


The second is that the agreement weight is only utilized if four corresponding subfields, the 
three subfields of CITY and the one STATE, agree. The variation, in effect, typically increases 
the relative distinguishing power of agreements/disagreements in subfields other than the CITY 
field. 

The largest reduction (from 409 to 0) in the number of possible links takes place in Class 
2. Aslightly higher proportion (.95 ~ 359/379) of nonlinks have an agreeing CITY field than 
links (.91 ~ 223/245). 

The following is an example of a match that is not designated as a link using the rule based 
on Type U weights but is using the rule based on Type C weights. 


NAME STREET GERRY; STATE ZTE 
Roberts Heat Oils 167 Sycamore St Dayton OH 53315 
Maxwell S Robert Heat Oil 167 Sycamore St Dayton OH 533135: 


The first six digits of the telephone number also agreed. 
The following is an example of an erroneous match using Type C weights. 


NAME STREET Ci STATE ZAP. 
Molar Petro 167 Sycamore St Dayton OH 53315 
Petrochem 167 Sycamore St Dayton OH 53315, 


These two companies do business from the same location and also have identical phone 
numbers. 
The following is an example of an erroneous nonmatch using Type C weights. 


NAME STREET CITY STATE ZIP 
Johns Geo M 167 Sycamore St Springfield OH 53315 
Geo M Johns Jobber 167 Sycamore Spring Field OH ie hdl ie 


Insertion or deletion of blanks in corresponding fields typically causes record pairs to be 
designated as a nonmatch. 


Survey Methodology, June 1989 113 


Table 9 


Bootstrap 90 Percent Confidence Intervals for Counts of Possible Links 
500 Replications 


Weight Class Ordinary BC BC) 

Type Interval Interval Interval 
G 1 (42,117) ( 37,108) ( 37,108) 
C 2 G.9,.4,9) Crea) Calla) 
Cc 3 ( 31,154) ( 34,156) ( 34,156) 
C 4 ( 0, 36) (eOF39) ( 0, 39) 
U 1 (122,192) (128,196) (128,196) 
U 2 (383,501) (383,501) (383,501) 
U 3 (149,201) (142,197) (142,197) 
if 4 (35052) ( 33, 81) (335081) 


3.3 Bootstrap Variation 


The results of this section involve increasingly more sophisticated methods of computing 
bootstrap confidence intervals (Table 9). For each class, 500 replications are used in computing 
90 percent confidence intervals for estimates of the number of records designated as possible 
links. The two error bounds are fixed at 5 percent. 

The first interval is the ordinary bootstrap interval that is partially based on normal theory 
(Efron 1979). The second interval, denoted by BC, is an interval in which a bias adjustment 
has been made (Efron 1979, 1982). The third interval, denoted by BC,, is obtained using 
acceleration-constant type adjustments for bias and skewness (Efron 1987; also Hall 1988). 

Examination of Table 9 yields that each of the intervals in respective classes are approx- 
imately the same length. If the method of adjusting to achieve weights of Type C were highly 
sensitive to the individual samples taken for calibration, we would expect the confidence 
intervals associated with Type C weights to be larger than those associated with Type U weights. 

The fact that the intervals are large for either type of weight indicates the results are quite 
dependent on the calibrating samples. The fact that the ordinary confidence intervals are 
roughly the same as the BC and BC, indicates that the respective distributions are neither 
biased nor skewed. 

The number of possible links in intervals based on Type C weights is almost always less than 
the corresponding intervals based on Type U weights. Only the intervals associated with classes 
3 and 4 show slight overlap. Thus, it is reasonable to accept the hypothesis that the linkage 
rule based on Type C weights consistently outperforms the linkage rule based on Type U 
weights. 


4. DISCUSSION 


This section is composed of four parts. The first contains a discussion of the robustness 
of the steepest ascent adjustments. The second subsection describes the implicit type of con- 
ditioning imposed by the steepest ascent adjustments. The third part considers the usefulness 
of making comparisons that are partially dependent on other comparisons. The fourth subsec- 
tion describes methods for determining sets of blocking criteria. 
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4.1 Robustness of Steepest Ascent Adjustment 


The sizes of regions of possible links are somewhat sensitive to the set of weights that are 
varied during the steepest ascent procedure. In two cases (one of which was presented in this 
paper), the numbers of possible links were approximately 100; in two others, 200. All four of 
the steepest ascent variations yielded improvements over the 700 possible links obtained by 
the best non-steepest ascent procedure. 

The individual weights that were modified varied significantly over the four cases. In no 
case were more than eight of the 30 weights varied. 

It is reasonable to hypothesize that the steepest ascent weighting procedure will yield 
improvements when deviations from conditional independence are substantial. No bootstrap- 
based significance tests were used to check the hypothesis for three of the four cases. 

Obtaining small samples that allow adjustments such as performed in this paper should be 
straightforward. Sample sizes of 100 in each class may be sufficient. The sample sizes used 
for the bootstrap results of section 3.3 were approximately 100 in each class. Comparable 
bootstrap results using samples of 30 and 50 in each class were not sufficient to show that 
adjustments yielded quantifiable improvements. Sample sizes of 200 yielded bootstrap con- 
fidence intervals that were almost the same as those based on samples of sizes 100. 

Many record linkage systems (e.g., U.S. Dept. Agriculture 1979; U.S. Dept. of Commerce 
1978a; Statistics Canada 1984) allow modification of matching parameters based on informa- 
tion from samples. Reestimation of parameters using sample information is a powerful feature 
of the Generalized Iterative Record Linkage System of Statistics Canada (1983). The parameter- 
reestimation in these systems generally involves direct reestimation of the marginal probabilities 
m;(y') and u;(y'). It does not involve adjustments of weights such as given in this paper. 


4.2 Type of Conditioning Represented by Modified Weights 


To prepare for the discussion in this section, we need two sets of facts. The first set involves 
the conditional discriminating power of components of y. Let o be a vector with components 


a}, o, ..., o% that consists of a reordering of the components y!, y*, ..., y* of y. Then 


- P(y|M) = P(o|M) = 
Pioissvoll o*f=ronmad oh Sop | Mone (4.1) 


P(o! = of|M) - P(o? = aflo'!,M) ... P(oX = offa'o’, ..., 0%", M). 


The component o! might refer to first name, 0” to house number, o° to age, and so on. 

For each o we can call P(o! = o'p\o!, 0”, ..., 0’ |, M) the successive conditional incre- 
mental discriminating component of o'in M,i = 1,2, ..., K. These incremental probability 
components are dependent on the reordering o!, o”, ..., o*. Each component on the right 
hand side of (4.1) is independent of the others. In a similar manner, we can consider incremental 
components in U. 

The basic purpose of a reordering is to consider one specific pattern of conditional pro- 
babilities for y¢€I’. For the single reordering we let 0 = o(y) vary in o(I’) as ye’. Then for 


all o€o(T), 
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W 


W(y) = Log,[m(y)/u(y)] 
(4.2) 
Sy Ach Aa Sate AM 


where A‘ = Log,[P(a! = o}|a1,07, ..., o'', M)/P(o! = ojo, o?,..., o'-!, U)] for 
fePledone 1: 

The second set of facts involves transformations that map the ratio R given by (2.1) to real 
numbers which we call weights. For each pair of Type I and Type II errors, we consider any 
transformation that places weights associated with links in the highest interval, weights 
associated with nonlinks in the lowest interval, and weights associated with possible links in 
the interval between the upper and lower intervals. Such a transformation yields rules that can 
be represented in forms similar to form (2.2) and are equivalent to the Fellegi-Sunter rule at 
the same fixed pair of error levels. If the transformation is monotone, then the new weights 
yield rules that are equivalent to the original Fellegi-Sunter rule for all error levels. 

The steepest ascent weight adjustment procedure implicitly determines a transformation of 
the ratio R and a single reordering that is fixed for all ye’ and the same in M and U. The fact 
that the steepest ascent procedure adjusts weights sequentially assures that there is a single 
reordering. The adjusted weights W' + c; are estimates that replace the W’ in (2.3) for some 
real constants c;, i = 1,2, ..., k. 

The fact that the adjusted weights yield smaller regions of possible links means that, at a 
fixed pair of error levels, the new total weights more accurately represent a transformation 
of the Log, of the ratio of the true probabilities given by the left hand of (4.1). The new total 
weights represent estimates that transform the right hand side of (4.2). 

The adjustment procedure allows us to utilize better the incremental distinguishing power 
of one field given another, a second field given the first two, and so on. We note that we do 
not need to know the specific transformation or the specific pattern of conditioning induced 
by the reordering. 

The adjustment procedure is similar to new bootstrap procedures (Efron 1987; Hall 1988). 
The validity of the bootstrap procedures is dependent on the existence of monotone transfor- 
mations, bias constants, and acceleration constants that yield the exact correspondence of con- 
fidence intervals of the original distributions with confidence intervals of specified normal 
distributions. The transformations and constants need not be known. 


4.3 Value of Dependent Comparisons 


The intuitive idea of making a number of comparisons, some of which may be partially 
dependent on other comparisons, is that they may, when used in properly adjusted rules, yield 
additional distinguishing power. Newcombe and Kennedy (1962, see also Newcombe et al. 1983) 
have given examples of comparisons of portions of name fields that intuitively may be depen- 
dent on other comparisons. The additional comparisons, nevertheless, may yield better linkage 
rules than those rules that do not utilize the same additional comparisons. 

The chief difficulty in using additional comparisons is properly utilizing their incremental 
distinguishing power. This paper’s set of comparisons - in particular, of subfields of the name 
field - is not independent in the sense of equation (2.3). The primary purpose of the set is to 
illustrate methods for systematically obtaining better linkage rules when the conditional inde- 
pendence assumption is not valid. 


116 Winkler: Methods for Adjusting for Lack of Independence 


4.4 Additional Blocking Criteria 


There are two conflicting goals when a set of blocking criteria is used to reduce the number of 
pairsin A x Bthat receive further processing. The first is the need to reduce (drastically) the number 
of pairs that are processed and to obtain a set in which linkage rules can accurately delineate matches 
and nonmatches. The second is to obtain a set that contains as many matches from M as possible. 

To determine whether it is feasible to look for additional sets of blocking criteria, it is first 
necessary to find estimates of the number of matches missed by a given set of blocking criteria. 
If the estimates are acceptably small, then it is not necessary to look for additional criteria. 

To estimate the number of matches missed by given sets of blocking criteria, Scheuren (1983) 
suggested using standard capture-recapture techniques such as given in Bishop, Fienberg, and 
Holland (1975, Chapter 6). Winkler (1987) applied the techniques to the same empirical data 
and four sets of blocking criteria as in this paper. 

The best fitting loglinear model for the table of counts of records captured and not cap- 
tured by the four sets of blocking criteria was used in obtaining a confidence interval for the 
number of matches missed. Based on assumed asymptotic normality, a 95 percent confidence 
interval (27,160) was computed. The interval represents between 1 and 5 percent of the matches. 


5. SUMMARY 


The results of this paper show that the conditional independence assumption is not always 
valid. When the assumption is not valid, it is possible to develop adjusted linkage rules that 
improve on the standard linkage rule. Under fixed bounds on error rates, the improved rules 
reduce the size of the region of possible links. 
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Automated Quality Assurance Procressing of 
Administrative Record Files 


JAMES R. JONAS and PAUL S. HANCZARYK! 


ABSTRACT 


The Census Bureau makes extensive use of administrative records information in its various economic 
programs. Although the volume of records processed annually is vast, even larger numbers will be received 
during the census years. Census Bureau mainframe computers perform quality control (QC) tabulations 
on the data; however, since such a large number of QC tables are needed and resources for program- 
ming are limited and costly, a comprehensive mainframe QC system is difficult to attain. Add to this 
the sensitive nature of the data and the potentially very negative ramifications from erroneous data, and 
the need becomes quite apparent for a sophisticated quality assurance system on the microcomputer level. 
Such a system is being developed by the Economic Surveys Division and will be in place for the 1987 
administrative records data files. The automated quality assurance system integrates micro and main- 
frame computer technology. Administrative records data are received weekly and processed initially 
through mainframe QC programs. The mainframe output is transferred to a microcomputer and for- 
matted specifically for importation to a spreadsheet program. Systematic quality verification occurs within 
the spreadsheet structure, as data review, error detection, and report generation are accomplished 
automatically. As a result of shifting processes from mainframe to microcomputer environments, the 
system eases the burden on the programming staff, increases the flexibility of the analytical staff, and 
reduces processing costs on the mainframe and provides the comprehensive quality assurance compo- 
nent for administrative records. 


KEY WORDS: Mainframe-microcomputer integration; Systematic data verification; Timeliness. 


1. INTRODUCTION 


The Bureau of the Census makes extensive use of administrative record information in our 
economic programs. The data originate from the business-related tax collection processes of 
the Internal Revenue Service (IRS) and, to a lesser extent, the Social Security Administration. 
During economic and agriculture censuses years, the volume of administrative record data 
received increases substantially. These data have enabled us to conduct economic and agriculture 
censuses on a timely and efficient basis and with a minimum of reporting burden on the busi- 
ness and farm communities. The success of our economic and agriculture programs depends 
to a great extent on the timeliness and quality of these administrative record files. 


It is vital for Census Bureau operations to ensure the quality of all incoming data. As in 
past economic censuses, we have developed mainframe quality assurance programs for the 
administrative record data. However, since such a large number of these tables are needed and 
resources for programming are limited and costly, a comprehensive quality assurance system 
is difficult to attain entirely on the mainframe. Add to this the sensitive nature of these data 
and the potential ramifications of erroneous data, and the need for a more sophisticated quality 
assurance system becomes apparent. The Census Bureau has developed a comprehensive quality 
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assurance system that manages various phases of our administrative records review process. 
This automated system will allow us to perform more thorough quality assurance within the 
bounds of restrictive budgets and limited programming resources. 

The automated quality assurance system integrates mainframe computer and microcomputer 
technology. The Census Bureau has established standards that delineate our fundamental re- 
quirements of the incoming administrative record data set. These standards are entered into a 
microcomputer system. After the mainframe quality assurance programs arerun, the results are 
downloaded into the same microcomputer system. The reporting patterns of the actual adminis- 
trative record data are then compared to the predetermined standards. Mechanical data verifica- 
tion occurs as data review, error detection, and report generation are accomplished automat- 
ically at the microcomputer level. Asa result of shifting processes from mainframe to microcom- 
puter environments, the system eases the burden on the programming staff, increases the flexibil- 
ity of the analytical staff, and reduces the processing costs on the mainframe. Moreover, the 
system provides the quality assurance component needed for thorough and unerring review of 
administrative records. Although designed specifically for the IRS business income tax return files 
used in the censuses, it can and will be adapted to all incoming administrative record files after 1988. 


2. OVERVIEW OF QUALITY ASSURANCE SYSTEM FROM 
A MANAGEMENT PERSPECTIVE 


Administrative records play a major role at the Census Bureau, a role that has steadily grown 
in importance over time. The increasing need for more and better statistics, the need to com- 
pile those statistics with a minimum of burden on the private sector, and the need to use our 
available human and financial resources as efficiently as possible have all contributed to the 
importance of administrative records. 

Over the past several years, the quality of the administrative records generally has been 
excellent. However, we did experience certain problems with the quality of the 1982 business 
income tax data from the IRS. The most detrimental problem was the inadequate quality of 
the principal industrial activity codes for sole proprietorships. As a result of this problem, the 
Census Bureau published only limited statistics for nonemployers in the 1982 Economic Cen- 
suses. If our quality assurance programs had been more sophisticated, the errors could have 
been identified earlier and the negative impact would have been minimized. 

Heading into the 1987 Economic Censuses, it was determined that additional measures were 
needed to ensure the quality of administrative record data received from the IRS. An overall 
quality management system responsive to certain factors that have adversely affected past admin- 
istrative data sets was necessary. The three major factors that have plagued us in the past are: 


1. Vast amounts of administrative record data 


The IRS will provide us with selected business 1987 tax return data (received in 1988) for 
various legal forms of businesses, including corporations, S corporations, foreign corpora- 
tions, partnerships, nonprofit organizations, and sole proprietorships. In total, the Census 
Bureau expects over 75 million tax return records in 1988. Table 1 details the approximate 
number of administrative records that will be used in the 1987 Economic and Agriculture 
Censuses for the various form types. Clearly, the number of data records received during 
census years is immense, but the complexity of the required quality assurance goes beyond 
sheer volume. A data record often contains several data items, each greatly increasing the 
detail of the individual records and the entire data files. Moreover, not all form types con- 
tain the same set of data items, nor do they have the same pattern of receipt. Consequently, 
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Table 1 


The Approximate Number of Administrative Records Used in the 1987 Economic and 
Agriculture Censuses for the Various Form Types by Tax Year 


Number of Records 


Type of Record 


1985 1986 1987 
Business Income Tax Files 2,617,000 20,051,000 30,881,000 
Form 1040, Schedule C — 11,750,000 12,500,000 
Form 1040, Schedule F 2,450,000 2,450,000 — 
Form 1040, Schedule SE — — 10,000,000 
Form 1120 42,000 2,550,000 2,650,000 
Form 1120-A — 200,000 210,000 
Form 1120F _— 11,000 11,000 
Form 1120S 17,000 900,000 950,000 
Form 1065 108,000 1,750,000 1,800,000 
Form 990 -- 380,000 400,000 
Form 990-PF — 35,000 35,000 
Form 990-T — 25,000 25,000 
Form 1120S, Schedule K-1 — — 700,000 
Form 1065, Schedule K-1 — — 1,600,000 
Annual Tax Files 41,950,000 43,500,000 45,050,000 
IRS Business Master File 24,000,000 25,000,000 26,000,000 
IRS Payroll and Employment File 17,000,000 17,500,000 18,000,000 
SSA Business Birth File 950,000 1,000,000 1,050,000 
Total 44,567,000 63,551,000 75,931,000 


in addition to performing quality review for over 75 million individual records, the 
Census Bureau must also be concerned with assuring the quality of the various data items 
on those 75 million records. 


Additionally, businesses file their tax returns with one of ten IRS centers. Each of the indi- 
vidual centers processes the returns, and the quality of data received from different ser- 
vice centers can vary. The Census Bureau reviews data at the service center level in response 
to such variation. 


2. Restrictive budgets 


Restrictive budgets are another major factor that contribute to the difficulty of assuring 
the quality of the administrative record data. In keeping with the overall governmental 
policy on spending, the Census Bureau is attempting to provide greater services at less cost. 
Workloads for programming staffs increase significantly during census years, yet the staffs 
do not expand proportionately. The quality assurance processing, which relies considerably 
on various computer resources, can be adversely affected. It is also important to note that 
most quality assurance processing is traditionally done at the mainframe computer levels. 
Use of the Census Bureau’s mainframe computer is costly and becomes more so as increas- 
ingly larger data files are processed. 


3. Lack of communication between agencies 


Miscommunication or lack of communication between agencies has contributed to past 
administrative record problems. Clear lines of communication between the Census Bureau 
and the agency providing the data during all phases of the procurement process also are 
essential for assured data quality. The agencies first must agree upon the data files and the 


122 Jonas and Hanczaryk: Automated Quality Assurance Processing 


specific data items that are needed and that can be provided. Certain data that the Census 
Bureau requests may not be available or in some cases affordable. Any discrepancies must 
be resolved in time to avoid delays, which could affect data utility. Moreover, the agencies 
must agree upon the expected quantity and quality of the administrative data. Requirements 
that quantify the Census Bureau’s expectations of the incoming data should be established. 


The development and implementation of the quality assurance system represent a com- 
prehensive response to the administrative record data problems we encountered in the past. 
The system provides for the review of large and complex IRS data files, promotes frequent 
interagency communication, and identifies errors instantly. The major element of the quality 
assurance system is the mechanized data verification. Basically, the Census Bureau establishes 
standards that detail our fundamental requirements of the incoming IRS data. The reporting 
patterns of the actual data are compared to these standards, and systematic data verification 
occurs at the microcomputer level. The Census Bureau then prepares status reports indicating 
whether the data conforms to the standards. 

Census Bureau staff members develop the standards far in advance of the actual receipt 
of the data. This gives the IRS ample opportunity to examine the requirements for 
reasonableness and request adjustments if necessary. The requirements are divided into timing 
standards and quality standards. The timing standards list the estimated total number of tax 
returns for the different types of businesses and the estimated number to be received by various 
dates. The quality standards detail the expected reporting patterns of specific data items. 

The mechanized data verification technique simplifies our analytical review process. A series 
of results tables are created that compare the actual data to the expected standards. Discrepancy 
flags are set for those data components that do not meet the standards. This approach minimizes 
the risk of analytical omissions during the review process. 

Status reports comparing the reporting patterns of actual data to the pre-determined stan- 
dards are sent to the IRS monthly. These status reports are a subset of the comprehensive results 
tables, detailing only the basic requirements of the IRS data set. The status reports promote 
communication between the agencies. If data problems exist, they are illustrated in the report. 
Immediately, the Census Bureau and the IRS must decide upon any remedial action or recovery 
efforts necessary to prevent compromising the censuses. Timeliness is crucial because the IRS 
data tapes are not kept indefinitely. If errors are not identified early and remedial action is not 
implemented in time, recovery of the data may not be possible or may become extremely costly. 

The quality assurance system is not designed to guarantee that administrative data prob- 
lems will never occur. It does serve, however, to document our requirements formally so that 
the characteristics of the data set are not left to chance, and monitoring and early error iden- 
tification are possible. 


3. DETAILS OF AUTOMATION 


Administrative record data files are received weekly and processed initially through main- 
frame quality assurance programs. The mainframe programs are prepared well before the 
administrative data files are received and generate the initial quality assurance tables that are 
fundamental to the entire review process. Traditionally, mainframe programmers were respon- 
sible for creating the entire data tables, which included data cells and the surrounding text (/.e., 
headers and stubs). However, for the data table programs associated with the 1987 Economic 
Censuses, the two data table components are handled separately. Data tabulation is performed 
as usual at the mainframe level whereas table text is created at the microcomputer level by non 
programmers. A procedure has been developed that generalizes data tables for all administrative 
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Table 2 


Weighted Distribution of Form 1040 Schedule C Records by 
Net Receipts Size Class by Service Center 


Net Receipts Size Class (000) 


Blank |— 2,500— 5,000— 
Service Center Total <<m(() or 0 2,499 4,999 9,999 
All Centers 1,327,100 200 52,200 149,300 73,900 98,100 
Atlanta 133,200 0 5,100 16,500 6,300 11,000 
Philadelphia 132,100 100 4,200 11,300 5,300 9,600 
Austin 147,600 0 6,300 20,900 9,900 12,900 
Cincinnati 153,100 0 5,300 14,900 8,700 9,800 
Kansas City 119,500 0 5,500 16,700 7,500 8,500 
Andover 111,100 0 3,800 9,800 6,700 8,200 
Ogden 162,300 0 7,500 20,200 7,900 11,600 
Brookhaven 119,700 0 4,400 12,600 7,100 10,000 
Memphis 111,900 100 4,700 14,700 6,700 8,600 
Fresno 136,500 0 5,400 11,700 7,800 7,900 
Others 100 0 0 0 0 0 

Net Receipts Size Class (000) 
10,000— 25 ,000— 50,000— 100,000— 250,000— 

Service Center 24,999 49,999 99,999 249,999 499,999 500,000 + 
All Centers 168,600 185,500 225,100 243,400 87,400 43,400 
Atlanta 17,000 19,800 22,200 22,200 8,400 4,700 
Philadelphia 17,800 19,800 22,700 27,000 10,100 4,200 
Austin 18,700 18,500 22,000 24,900 9,100 4,400 
Cincinnati 20,500 20,700 27,300 30,500 9,600 5,800 
Kansas City 16,200 15,900 20,700 18,300 6,400 3,800 
Andover 13,600 16,700 19,500 20,000 8,800 4,000 
Ogden 17,800 19,500 28,800 33,600 11,200 4,200 
Brookhaven 16,400 19,700 20,400 19,400 6,400 3,300 
Memphis 15,100 14,700 18,600 19,000 6,800 2,900 
Fresno 15,500 20,200 22,900 28,400 10,600 6,100 
Others 0) 0 0 100 0 0 


records files. This procedure has allowed the Census Bureau to design a microcomputer program 
that is capable of building table images for any administrative records file. Once built, the table 
images are uploaded to the mainframe and used by programmers to align data tabulation files. 
The job of programming the quality assurance tables is greatly simplified, as table image forma- 
tion is handled by nonprogrammers, leaving mainframe programmers adequate time to concen- 
trate their efforts solely on data tabulations. Table 2 illustrates one of the various mainframe 
tables that is produced for each of the different forms of organization. This table shows the 
weighted distribution of Form 1040, Schedule C records by service center by net receipts size class. 

The mainframe computer performs only the basic data tabulations of the administrative 
records files (i.e., generates current tables). The output from these mainframe quality assurance 
programs is downloaded to a microcomputer, and all remaining review operations are 
automated at the microcomputer level. The various operations performed on the microcom- 
puter include calculating percentages used in the review of the current tables, producing 
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cumulative tables, performing key data item verification, and generating quality assurance 
status reports. Developing this systematic approach, using mostly micro-computer technology, 
has allowed greater flexibility of review as well as lessened the workload of mainframe pro- 
grammers. 

The mainframe quality assurance output is imported into a prestructured spreadsheet on 
the microcomputer. This spreadsheet also will contain the predetermined standards that outline 
the Census Bureau’s expectations of the incoming data set. Automatically, a mechanical table 
review and data verification are performed; and inconsistencies between the actual data sets 
and the standards are identified within the results tables. The two major benefits of this data 
verification system are: 


1. It enables us to easily spot problems in the data. Data components that do not meet the 
standards are flagged for analyst review. The possibility of overlooking errors in the 
administrative data is minimized. 


2. It directs us to areas of the data that require further investigation. The results tables often- 
times lead us to problems even though the overall standards are met. For example, certain 
unexpected trends in the results report are reviewed in additional detail. In effect, the results 
tables enable us to concentrate on those areas that may contain problems. This may involve 
additional review at the service center level, or it may even require us to download records 
with these certain characteristics to the microcomputer. We then review these records on 
a manual basis in an effort to spot the problem. 


As previously stated, the standards detail the basic data quality requirements that are essential 
to the 1987 Economic and Agriculture Censuses. This procedure of automatic quality verifica- 
tion (i.e., comparing the incoming data to predetermined standards) allows us to determine 
immediately if the basic quality of the incoming data is acceptable. 

After current cycle review and verification, cumulative tables are prepared on the microcom- 
puter. This technique of producing cumulative tables on the microcomputer rather than the 
mainframe provides a more efficient use of our resources. First, it eliminates the need to retain 
cumulative files on the mainframe system, which reduces computer costs. In the past, these 
cumulative files were retained on the mainframe and added to each subsequent current cycle 
to form the next set of cumulative tables. Using microcomputers, simple formulas were 
established within the spreadsheet that created cumulative tables at virtually no cost. Secondly, 
the quality assurance tables for the cumulative portion do not require mainframe program- 
ming. A printout of the cumulative quality assurance tables are produced and retained for 
analysis and documentation purposes. 

In addition to this comprehensive set of cumulative tables, we produce a set of results tables. 
As was the case with the current cycle, these results tables detail comparisons of certain key 
data items. Table 3 shows one of the many results tables that is produced for the cumulative 
quality assurance. This table details the actual number and percent of the weighted Form 1040, 
Schedule F records by service center, together with the expected percent. As can be seen, the 
cumulative data are reasonable and fall within the acceptable standards. If inconsistencies did 
exist, the applicable service center would have been flagged. The final component of the 
automated quality review process is the generation of a report detailing the status of the 
cumulative IRS data file. This report compares the overall quality of the data set to the expected 
quality indicated in the timing and quality standards. The reports are generated and provided 
to the IRS approximately monthly. As discussed earlier, the status reports capsulize the quality 
of the administrative data for representatives of both agencies, which promote frequent 
interagency communication. 
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4. RESULTS OF QUALITY ASSURANCE REVIEW 


The timing and quality status reports can serve to alert both the Census Bureau and the IRS 
of data problems in their early stages and facilitate cooperative action by both agencies. In 
most of the cases, however, the timing and quality standards alert us of changes in respon- 
dent reporting patterns. These circumstances require no corrective action by the IRS, but they 
may have cost and processing implications for the Census Bureau in the 1987 Economic and 
Agriculture Censuses. Tables 4a and 4b illustrates this point well. Through late May 1987, the 
Census Bureau had received approximately 697,600 Form 1120 returns (i.e., corporations) with 
a standard of 760,000 returns. The standard for the number of Form 1120 returns was not met. 
However, the shortfall in the number of Form 1120 returns was offset by an increase in the 
number of Form 1120S returns (i.e., S corporations). The Census Bureau had received approx- 
imately 328,850 Form 1120S returns, far exceeding the standard of 225,000. The shift in the 
number of returns for these two types of corporations resulted from the perceived advantages in 
the new tax law associated with filing Form 1120S rather than Form 1120. Although this repre- 
sented a legitimate shift in taxpayer reporting patterns that was not a data error, the information 
was pertinent to our processing. We are implementing a procedure for 1987 that will account 
for such a shift from corporations to S corporations. Table 5 illustrates one of the various tables 
from the quality portion of the report. As indicated, the quality of these data meets the stan- 
dards for each of the basic data items. If an item had failed the standard, it would have been 
flagged for analyst research. 


Table 3 
Percent of Weighted 1986 Form 1040, Schedule F Records by Service Center 


i TUE UE EEE 


Service Centers 


Total EEE 
Tax Year Schedules Atlanta Philadelphia Austin Cincinnati Kansas City 
Se eee ————————————————————— 
1986 
Count 2,087,200 176,700 71,600 374,900 262,100 358,600 
Percent 100.0 8.5 3.4 18.0 12.6 142 
Expected 
Percent 100.0 8.5 3.0 18.5 iss 175 
ce a i a SN ee a RS Se ee a ln A 
Expectation! 
Not Satisfied 
Se ee ee ee ee ee ee ree, 
Service Centers 
WS Ee el Ary II oS KS) TIRE OL ere eee eee ee 
Tax Year Andover Ogden Brookhaven Memphis Fresno Others 
in ae San tl a a ela rN iT AR SA i, NN 
1986 
Count 118,800 343,200 40,300 288,100 52,500 400 
Percent Sail 16.4 1.9 13.8 PS) 0.0 
Expected 
Percent 55) 16.5 2.0 14.0 Py) 0.0 


pee ers ee PE EE Se 


Mee Di ieraric. ike sien Sti nven dire sation. Gs Sires pg eee 8 OEP 
Expectation ! 
Not Satisfied 


ee ee __ EEE 


1 Acceptance interval of + or — 2.0 percent. 
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Table 4a 
The Weighted Number of 1986 Form 1120 Returns by Various Dates 


Form 1120 Returns 


Wate Requirement 
Actual Required Not Satisfied 

Late March 1987 326,500 303,000 

Late April 1987 697,600 760,000 Not Satisfied 

Late May 1987 988,000 

Late June 1987 1,190,000 

Late July 1987 1,418,000 

Late August 1987 1,621,000 

Late January 1988 2,077,000 

Late October 1988 2,533,000 

Table 4b 
The Weighted Number of 1986 Form 1120S Returns by Various Dates 

pata piesa iy toned ees s Asap see Requirement 
Actual Required Not Satisfied 

Late March 1987 103,350 90,000 

Late April 1987 328,850 225,000 

Late May 1987 292,000 

Late June 1987 352,000 

Late July 1987 420,000 

Late August 1987 480,000 

Late January 1988 615,000 

Late October 1988 750,000 


The automated quality assurance of administrative records files will be completely opera- 
tional for the 1987 IRS data files. Prototypes of the system have been and are being used for 
the 1985 and 1986 IRS business income tax files. For both years the automated process and 
the entire quality assurance system have been instrumental in the successful procurement and 
review of the IRS data files received for the censuses. 

The integration of both mainframe and microcomputer technology in the automated quality 
assurance system has allowed the Census Bureau to effectively and comprehensively assure 
the quality of the large data files provided by the IRS. In addition, mainframe computer pro- 
grammer workloads have been and will continue to be lessened since much of the automation 
was designed and is controlled by nonprogramming staff and is implemented in a microcom- 
puter environment. Mainframe computer resources are reduced and programming burden is 
lessened allowing programmers to concentrate their efforts on basic data tabulation. Also 
important, the automated system provides the flexibility of review for different levels of person- 
nel. Managers can review the summarized timing and quality report and determine the status 
of the business income tax files quickly and efficiently. Subject-matter analysts will review the 
more comprehensive quality assurance reports that are produced weekly. As mentioned above, 
the quality assurance system will direct the analysts to the data elements that require further 
investigation. 


Survey Methodology, June 1989 Lear 


Table 5 
Data Element Reporting Patterns of Weighted 1986 Form 1120S Returns 
Ca 
Percent of 
Data Rlements Form 1120S Returns Requirement 
Actual Required Not Satisfied 

a ee eee eee 
EIN 

Blanks, all zeros, or nonnumerics 0.0 Less than 1.0 

Invalid IRD 0.0 Less than 1.0 
PBA CODE 

Blanks or nonnumerics 0.0 Less than 6.0 

Blanks, nonnumerics, unclassified, or 

invalid PBA codes DIES Less than 18.0 
GROSS RECEIPTS OR SALES LESS 

RETURNS AND ALLOWANCES 

Blanks, all zeros, or nonnumerics 20.9 Less than 40.0 
Of records with a positive numeric entry, 
the percent in various size ranges: 
- Less than $100,000 45.7 30.0 — 60.0 
- Greater than or equal to $100,000 and 

less than $500,000 36.9 20.0 — 50.0 
— Greater than or equal to $500,000 17.4 10.0 — 30.0 
ACCOUNTING PERIOD 

Blanks, all zeros, or nonnumerics 0.0 Less than 1.0 


5. SUMMARY 


The Census Bureau has designed an overall quality assurance system that is comprehensive 
and responsive to the potential problems and limiting factors of complete quality assurance. 
The system responds to the large volumes of IRS data by interacting with the IRS closely and 
promptly to ensure proper data procurement. The expected quality of these large data files 
is jointly determined and agreed upon with the IRS through the timing and quality standards 
and is verified by the automated QC process. Given this automated process, data verification 
can occur within the bounds of restrictive budgets and limited programming resources. 
Microcomputer technology has increased the role and flexibility of subject-matter analysts while 
lessening the burden of mainframe programmers. Communication with the IRS is frequent 
and productive, resulting in efficient procurement procedures and improved data quality 
awareness on the part of IRS and the Census Bureau as well. This collective response to past 
difficulties will ensure the Census Bureau of receiving the data necessary to conduct the 1987 
Economic and Agriculture Censuses in the best manner possible. 
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Using Administrative Record Data to Evaluate 
the Quality of Survey Estimates 


JEFFREY C. MOORE and KENT H. MARQUIS! 


ABSTRACT 


The Survey of Income and Program Participation (SIPP) is a new Census Bureau panel survey designed 
to provide data on the economic situation of persons and families in the United States. The basic datum 
of SIPP is monthly income, which is reported for each month of the four-month reference period preceding 
the interview month. The SIPP Record Check Study uses administrative record data to estimate the quality 
of SIPP estimates for a variety of income sources and transfer programs. The project uses computerized 
record matching to identify SIPP sample persons in four states who are on record as having received 
payments from any of nine state or Federal programs, and then compares survey-reported dates and 
amounts of payments with official record values. The paper describes the project in detail and presents 
some early findings. 


KEY WORDS: SIPP; Record check; Record linkage; Survey response validity. 


1. INTRODUCTION 


This paper addresses issues concerning the use of records to evaluate the quality of survey 
estimates and describes a specific application to the Survey of Income and Program Participa- 
tion (SIPP) in the United States. 

Matching administrative records to survey observations on a case-by-case basis, which we 
call a ‘‘record check,’’ provides useful information to survey users and designers. A record 
check enables the analyst to make a full range of measurement error parameter estimates for 
evaluation purposes. These estimates, in turn, facilitate two basic kinds of activities: 


1. quantifying the effects of measurement errors on subject-matter estimates such as means, 
proportions, correlation coefficients, and multivariate regression coefficients (and 
possibly adjusting the estimates to correct for the measurement errors), and 

2. deriving more efficient survey designs that directly address, for example, the tradeoffs 
between measurement quality and costs. 


1.1. Basic Terms 


Our focus here is on measurement (or ‘‘response’’) errors, although the record check method 
can be extended to evaluate other nonsampling and sampling errors also. This is not a tech- 
nical exposition, but we do need to define some of our basic terms first. We assume that the 
survey observation from sample element i can be expressed as the sum of the true value and 
an error, e: 


Survey; = True; + @;. 


! Jeffrey C. Moore and Kent H. Marquis, Center for Survey Methods Research, U.S. Bureau of the Census, Room 
433 Washington Plaza Building, Washington, DC, 20233. This is a revised version of a paper presented at Statistics 
Canada’s International Symposium on Statistical Uses of Administrative Data, November 23-25, 1987. This paper 
presents the views of the authors and does not necessarily represent official Census Bureau policy or opinions. 
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The average bias in a set of Nsurvey observations, which we call the response bias or survey 
bias, is 


é= 3 e;/N, 


and the response error variance is just Var e. 
Similarly, the measurement model for the administrative record observation is: 


Record; = True; + 4;, 


so that record bias is # and record error variance is Var u. 


1.2 Comparison of Evaluation Approaches 


The capabilities of the record check approach can be contrasted with other methods of 
evaluation such as reinterviews and experiments. Reinterviews and other repeated measures 
designs aim at estimating a very limited set of measurement error parameters, usually something 
called the simple response variance or the response error variance. These approaches implicitly 
make strong assumptions about true change over time and about either the true value or bias 
parameter (Marquis 1986). 

One frequently attempted remedy is to create a true value measurement as part of the reinter- 
view program, for example by reconciling discrepant answers with a knowledgable respondent 
or by asking much more detailed and specific questions during the reinterview. But the validity 
of these ‘‘true value’’ measures is suspect. Both Bailar (1968) and Koons (1973) have shown, 
for example, that reconciled reinterview responses are biased. And while detailed, specific ques- 
tioning is often preferred to a more global approach, there is no independent evidence that 
it reduces measurement biases to zero — or at all. Record checks potentially provide higher 
quality criterion information requiring much weaker (and perhaps more realistic) assumptions 
for purposes of estimating survey data quality. 

A different method of evaluating aspects of surveys is the experiment, such as a fully-crossed 
factorial design or an interpenetrated design for assigning interviewers. Analysts compare 
experimental groups with respect to statistics such as subject matter means or proportions and 
draw conclusions about which treatment produces more or less reporting of the subject matter 
of interest. What is controversial, however, is determining which is ‘‘better’’ ina measurement 
sense, a difficulty that is much reduced when criterion data — such as administrative records — 
are available. 

Without criterion data, it is often necessary for the analyst to resort to strong assumptions 
about measurement errors, such as: 

1. more reporting is better reporting; 
forgetting of meaningful material increases with the passage of time; 
unbounded interviews contain overreports, bounded interviews don’t; 
reporting performance decays with length of interview or time-in-sample; 
people are basically lazy and devious — they will lie to avoid being asked a detailed set 
of questions; and 
6. self reports are better than proxy reports. 


mh WwW WN 


Indeed, these assumptions have become part of the folklore of survey design in the western 
world. And yet, it is difficult to find any support for any of these assumptions from 
appropriately designed record checks. Experiments and related arrangements are excellent 
approaches to pinpointing the sources of variation, and in untangling estimation problems of 
colinearity, but are often unnecessary and seldom sufficient for evaluating an existing measure- 
ment process. 
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In sum, these other evaluation approaches are forced to make strong assumptions about: 


1. the independence of the original and evaluation measures when they are clearly dependent; 

2. the relationship of the original measure to a criterion when no objective, external link 
exists; and/or 

3. cognitive processes not supported by research. 


Record checks also employ assumptions in evaluating measurements. For example, the usual 
way of estimating the response bias is to assume no record bias (@ = 0) and simply calculate 
the average of the differences between the matched survey and record observed values: 


Estimated Survey Bias = D> (S; — R;)/N. 


While one cannot directly support the no-record-bias assumption, one can conduct meaningful 
sensitivity tests of the effects of possible violations of the assumption on evaluation conclusions. 


1.3 Issues in Designing Record Checks 


Several issues merit consideration in designing a record check to evaluate survey measure- 
ment. We comment on some of the main ones here: incomplete observation designs, matching 
errors, record errors, true value differences, and absence of repeated measures or experimental 
design features. 


1.3.1 Incomplete Observation Designs 


Past record checks have often used one-directional or partial designs for data collection, 
such as when we survey people about owning library cards and check the records for those 
who claim to have one, or sample from a list of people with a diagnosed chronic disease and 
survey them to see if they report it in a survey questionnaire. Because these partial designs do 
not observe the full range of response errors in the correct proportions, they yield biased 
estimates of such classical measurement error parameters as the response bias and the response 
error variance. One-directional designs can fail to detect some or all of the true survey bias, 
can cause the analyst to interpret up to one-half of the response error variance as response bias, 
and can predetermine the sign of the estimated response bias if the measured variable is binary 
(Marquis 1978). Full designs are a necessary (albeit not sufficient) condition for obtaining 
unbiased estimates of the desired response errors. 


1.3.2 Matching Errors 


The essence of the record check is a one-to-one matching of survey and record observations. 
This is difficult to do correctly, and matching errors (false matches, false nonmatches) will 
potentially bias the measurement error estimates of interest. Neter et a/. (1965) show that when 
there are no umatched cases, the mismatches will bias the estimates of response error variance 
upward. In terms of the reliability of a dichotomous measure (which is a function of the response 
error variance), the estimate will be attenuated by exactly the match error rate (Marquis ef al. 
1986). It is therefore desirable to keep match errors to a minimum and to know something about 
the errors that remain. 


1.3.3 Administrative Record Errors 


As noted earlier, one usually has confidence that the records in a record check study are 
very good measures of the trait of interest. If the implied assumptions about record measure- 
ment bias and record measurement error variance are violated, this can cause the response error 


132 Moore and Marquis: Using Administrative Record Data 


estimates to be biased away from zero. For example, bias in the record observations can appear 
as bias in the survey observations but with the opposite sign. Feather (1972) describes this effect 
in a record check of physician visits in Saskatchewan, in which an apparently large survey over- 
reporting rate was due to the record’s recording a complete treatment procedure rather than 
the individual visits for diagnosis. Similarly, the presence of measurement error variance in 
the record can cause inflated estimates of response error variance in the survey (Marquis 1978). 


1.3.4 True Value Differences 


Problems arise when the survey and record systems use different definitions. This is often 
the case in ‘‘aggregate comparisons’”’ of population parameter estimates made separately by 
each source. A common difference is in the scope of the populations covered, such as when 
the survey frame is limited to the civilian, noninstitutionalized population and the record 
includes everybody. Case-by-case matching can minimize the threats posed by differential cov- 
erage, but even estimates derived from these studies can still be plagued by differences in the 
concepts or the attributes of the concept. For example, Cox and Iachan (1987) report the results 
of a study which compared survey-reported health conditions with medical records. The authors 
conclude that a major reason for the lack of correspondence between survey and record reports 
was differing concepts — the survey was designed to elicit the complaints which led to doctor 
visits while the medical records focused on final diagnoses. As an example from our study, 
the administrative records often contain the date a check was written for a transfer payment, 
while our survey respondents tell us when they received the payment. Such differences can 
threaten our time-related estimates of such things as telescoping response errors. 


1.3.5 Absence of Experiments and Reinterviews 


Evaluation record checks can detect errors but are not good at evaluating the remedies for the 
errors. To know how well a different survey design might perform, one must usually either test 
the alternative design options or arrange to estimate parameters of an underlying model from 
which survey designs can be derived (e.g., a model of forgetting effects). For example, an evaluation 
record check design can estimate and compare response errors for self and proxy respondents. 
Without heroic assumptions it cannot, however, suggest how the measurement error parameters 
would change if the survey’s respondent rule were changed (say, to allow only self response). 

Similarly, a record check without a reinterview or another set of independent measures is 
limited in the number of basic error parameters it can estimate. For example, our initial defini- 
tions mentioned three parameters: true value, survey error, and record error. Without a reinter- 
view (or other independent measure) there are only two measures with which to estimate the three 
unknowns. An additional measure can help identify the estimates of the parameters in the model. 


2. CHARACTERISTICS OF SIPP 


Here we briefly describe the main features of SIPP — the Survey of Income and Program 
Participation — as a prelude to discussing the record check evaluation design. 


2.1 Overview of SIPP Contents 


The purpose of SIPP is to provide improved information on the economic situation of people 
and households in the United States. It collects comprehensive longitudinal data on cash and 
noncash income, eligibility for and participation in Government transfer programs, assets and 


Survey Methodology, June 1989 133 


liabilities, labor force participation, and a host of related topics. SIPP data assist the evaluation 
of the cost and effectiveness of current Federal government programs, the potential impacts of 
proposed program changes, and the actual impacts of changes when implemented. In general, the 
Census Bureau and other Government agencies which have fostered and supported the develop- 
ment of SIPP expect it to be an invaluable tool for domestic policy planning (Nelson et a/. 1985). 

Core SIPP questions — repeated in each wave of interviewing — cover labor force participa- 
tion and amounts and types of income received, including transfer payments and noncash 
benefits from various programs for each month of the reference period. The core questions 
cover nearly 50 sources of income, including Government transfer payments from retirement, 
disability and unemployment benefits, and welfare programs such as Aid to Families with 
Dependent Children. Information is also gathered on noncash programs such as food stamps, 
Medicare and Medicaid; private transfers such as pensions from employers, alimony, and child 
support; ownership of assets that produce income, such as interest, dividends, rent and royalties; 
and on miscellaneous sources of income, such as estates. 


2.2 SIPP Data Collection Design 


SIPP started in October 1983 with a sample of approximately 25,000 designated housing 
units (the ‘‘1984 Panel’’) selected to represent the noninstitutional population of the United 
States. In February 1985 a new and slightly smaller panel was introduced. Additional panels 
are to be introduced each February throughout the life of the survey. Due to budget reduc- 
tions, the sample size for new panels is currently about 15,000 households. 

Each sample household is interviewed by personal visit once every four months for 2-1/2 years, 
resulting in a total of eight interviews. The reference period for each interview is the four months 
preceding the interview month. At each visit to the household, each person fifteen years of age 
or older is asked to provide information about himself/herself. Proxy reporting is permitted for 
household members not available at the time of the visit. Information concerning proxy 
response situations is recorded and is available for analytical purposes. 

To facilitate field operations, each sample panel is divided into four subsamples (“‘rotation 
groups’’) of approximately equal size, one of which is interviewed each month. Thus, one 
‘““wave’’ or cycle of interviewing is conducted over a period of four months for each panel. This 
design produces steady field and processing workloads, but it also means that each rotation 
group uses a slightly different four-month reference period. 

Beginning with the second wave of interviewing in the 1984 panel, SIPP conducts reinterviews 
with a small sample of households about a subset of items (including program participation). These 
data are used to check for interviewer falsifications and perhaps to estimate response inconsistencies. 


3. RECORD CHECK DESIGN 


The purpose of the record check is to provide an evaluation of some of the income data 
gathered in SIPP. We highlight important features of the design of the record check next, 
covering the samples, the administrative records, the matching approach, and the analysis. 


3.1 Record Check Samples 


The SIPP Record Check uses a ‘‘full’’ rather than a one-directional design; that is, the records 
allow us to validate all observed values in the survey. Design options we did not choose include: 
1. checking records only for people who claimed to be participating in a program, or 
2. drawing a sample of known recipients and interviewing them to determine how truthfully 

they report. 
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Both of these designs are incomplete and will result in biased estimates of the response error 
parameters. 

The Record Check Study restricts attention to a subset of available SIPP data from the 1984 
Panel. First, the sample of people is restricted to households in four target states: Florida, New 
York, Pennsylvania, and Wisconsin. In the 1984 Panel this translates to approximately 5,000 
households. Second, the study’s sample of time periods includes only the first two waves of 
the 1984 Panel. Figure 1 illustrates the wave, rotation group, interview month, and reference 
period structure for the target survey data. 

Third, the SIPP Record Check Study focuses on the quality of recipiency and amount 
reporting for selected Government transfer programs. It compares survey reports and 
administrative records for five Federally-administered programs (Federal Civil Service Retire- 
ment, Pell Grants, Social Security (OASDI), Supplemental Security Income (SSI), and 
Veterans’ Compensation and Pensions), and four state-administered programs (Aid to Fami- 
lies with Dependent Children (AFDC), food stamps, unemployment compensation, and 
worker’s compensation). 

We limited the study to four states — Florida, New York, Pennsylvania, and Wisconsin — in 
order to keep the study to manageable proportions. Major criteria used to select these states were: 

1. the presence of a computerized, accessible, and complete record system for all target 

programs; 

2. alarge SIPP sample; 

3. reasonable geographic diversity; and 

4. a willingness to share individual-level data for purposes of this research. 

Thus, the states were selected purposively; no attempt was made to sample states to be represen- 
tative of the Nation. 

We requested from each participating state agency identifying and receipt information for 
all persons who received income from the target program at any time from May 1983 through 
June 1984. The identical request was made of the participating Federal agencies, with the excep- 
tion that only recipients residing in one of the four selected states were to be included in the 
data extract. 
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Figure 1. Survey Structure for Data Included in the SIPP Record Check Study. 


! Technically, rotation group 4 of the 1984 SIPP Panel was not administered a Wave 2 interview. The ‘‘missing’’ interview was 
transparent to respondents, however, who were simply given their Wave 3 interview at the time they would have received the Wave 2 
interview. For present purposes, the Wave 3 interview for rotation group 4 is identical to the Wave 2 interview for all other rotation 
groups, and is included in the Record Check Study in order to have two interviews from all sample cases. All references in the text 
of this paper to ‘‘Wave 2’’ include the Wave 3 interview for this portion of the panel. 
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As noted earlier, errors in the records can cause problems for record check evaluation studies. 
Although several of the administrative record files obtained for this project contain very minor 
deficiencies, only two appear likely to pose major analytical problems: the New York worker’s 
compensation file, and the Veterans’ Compensation and Pensions file. Each is known to be in- 
complete in its coverage of recipients. The New York file excludes an unknown number of cases 
which were ‘“‘closed”’ (i.e., cases which had already been adjudicated and for which payments by 
a private insurance carrier had already begun) at the time the data base was created several years 
ago. The Veteran’s file excludes the approximately one percent of all recipients whose benefits were 
sent to a financial or other institution. There are no known coverage problems with any other files. 

An unavoidable problem which afflicts all of the administrative files to some extent is the 
discrepancy between payout date and receipt of payment; obviously, the SIPP respondent 
reports the latter and has no knowledge of the former, and the reverse is true for the program 
records. Where the payout date is close to the end of a month it may be difficult to distinguish 
a forward telescoping error from a legitimate difference between month of payment and month 
of receipt. Where there are definitional discrepancies, such as this payment date issue, our 
analyses will attempt to model them explicitly. 


4. MATCHING 


4.1 Introduction 


The quality of matching has important effects on some of the most critical response error 
estimates, such as the response error variance. Ideally, variables used to match survey and record 
observations are measured without error and are able to identify an individual uniquely. The 
ideal, of course, is never realized. 

However, the variables we have available to match surveys and records should go a long 
way toward minimizing the match errors. Some, such as social security number (SSN), uni- 
quely identify an individual even if other information such as address is outdated, garbled, 
or obliterated or missing. For purposes not directly related to this study (although certainly 
of benefit to it), the Census Bureau has taken special measures to ensure that SSN informa- 
tion as reported to the SIPP is complete and valid. For all Wave 1 and 2 sample persons, 
reported SSN’s and reports of not having an SSN were verified and, if necessary, corrected, 
by the Social Security Administration. Sater (1986) estimates that as a result of this operation 
the SIPP file contains a valid SSN for about 95 percent of SIPP sample persons who have one. 

The wealth of other data — last name, first name, house number, street name, apartment 
designation, city, zip code, sex, and date of birth — is sufficient for high quality matching 
even in the absence of a unique identifier such as SSN. In addition, to aid us in evaluating the 
impact of any remaining match errors, the Census Bureau’s matcher produces an ordinal 
measure of the goodness of the match/nonmatch of each survey observation to its appropriate 
administrative record counterpart. 


4.2 The Census Bureau’s Computerized Match Procedures 


The Record Check Study uses computerized matching procedures applying the theoretical 
record linkage work of Fellegi and Sunter (1969). The process involves multiple discrete steps, 
but basically there are four: 

1. standardizing the common data fields in the two files which the matcher will examine 

to determine whether a pair of records is a match or not; 

2. sorting the two files into small subsets of records (or ‘“blocks’’) which constitute a feasible 

number of pairs to be examined by the matcher; 
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3. determining and quantifying the usefulness of each data field to be considered in the 
match for identifying true matched pairs; and 
4. implementing the computer algorithms which perform the actual record matching. 


4.2.1 Standardization 


The Record Check Study processes all data files — both the SIPP files and the administrative 
record files — through an address standardizer which standardizes the format of various com- 
ponents of an address (e.g., street name, type, and direction; city name; state abbreviation; 
etc.) and parses each component into a fixed data field. Several programs have been devel- 
oped for this purpose; we use the ZIPSTAN standardizer developed at the Census Bureau. 

In addition to the standardization procedures which apply to all data files, many files require 
modifications to individual data fields to ensure a common format across files for matching. 
Common examples of variables which pose problems of this type are sex (which can be 
represented by either an alpha (‘‘m’’ or ‘‘f’’) or a numeric (‘‘1”’ or ‘‘2’”) code); date of birth 
(which has many variants — e.g., ‘‘mm-dd-yy,”’ or ‘‘cc-yy-mm-dd,”’ or the Julian format); and 
name (which may be a single field or which may have separate fields for each component). 
We prepare custom-made programs for this type of standardization. 


4.2.2 Blocking 


Blocking — establishing subsets of records for the matcher to examine in searching for 
matched pairs of records (e.g., Jaro 1985) — is necessary when matching files with large numbers 
of records. Obviously, the probability of finding all true matches would be highest if, for each 
record on one file, the entire other file were searched for a match. However, for large files 
such unrestricted searches for matched records are simply not feasible. Blocking each file into 
subsets of records makes matching large files feasible, but at the cost of excluding some records 
from the search; it thus increases the likelihood that some true matches will be missed. Ideal 
blocking components, therefore, have sufficient variation to ensure the partitioning of the files 
into many (and therefore smaller) blocks, and are effective match discriminators — that is, nearly 
always agree in true match record pairs and nearly always disagree in true nonmatch record pairs. 

The study uses multiple independent blocking strategies for each pair of files to be matched, 
thus minimizing the likelihood that a true match pair will escape detection as a result of blocking. 
One primary blocking strategy employs the first three digits of the United States Postal Ser- 
vice’s five-digit ZIP code and a four-character SOUNDEX code derived from the sample 
person’s/recipient’s last name. The ZIP code is a sub-state geographic indicator which gener- 
ally is recorded quite accurately according to Census Bureau matching experts. The SOUNDEX 
algorithm is widely-used for creating a standard length, standard format code from input 
character strings of varying lengths; its advantage for blocking purposes is that it minimizes 
blocking errors due to misspellings, although it cannot eliminate such errors entirely. The second 
primary blocking arrangement uses the last four digits of the SSN. 


4.2.3 Data Field Match Weights 


With some variation, the data fields used in the matching of the SIPP and administrative 
record files include house number, street name, apartment number, city, ZIP code, SSN, sex, 
date of birth, last name, and first name. Intuitively, these fields are not equally useful in deter- 
mining whether a particular pair is a match or not — as an obvious example, agreement on 
sex is not as indicative of a true match as is agreement on SSN. Fellegi and Sunter (1969) include, 
in their presentation of a general theory of record linkage, discussions of weight calculations 
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reflecting different data fields’ differing discriminating powers and how these weights feed into 
optimal decision rules. The Census Bureau’s Record Linkage Research Staff has developed 
programs using Newton’s method for non-linear systems (see Luenberger 1984) to solve the 
Fellegi-Sunter equations, and these programs are used in the SIPP Record Check Study to com- 
pute final match weights. 


4.2.4 The Computer Matcher 


The Census Bureau’s computer matcher executes the Fellegi-Sunter procedures on a user- 
defined set of data fields on files sorted (blocked) according to user specifications. For each 
data field to be considered in the match, the user supplies match weight seed values, defines 
the type of agree/disagree comparison (whether the fields must be exactly comparable in order 
for the matcher to treat them as agreeing, or whether only approximate comparability is 
necessary), and identifies missing value entries and specifies how they are to be treated (included 
or ignored in the calculation for a composite match weight). The user sets the composite weight 
cutoff values for matched pairs and nonmatched pairs, and generates the appropriate COBOL 
program codes to conduct a match through GENLINK, the Census Bureau’s Record Linkage 
Program Generator (LaPlant 1987). 

In simple terms, the matcher: 

1. searches each data file for comparable blocks of records — that is, records which agree 

exactly on the designated blocking components; 

2. counts the number of records in found blocks to ensure that neither file’s block size 

exceeds the preset maximum; 

3. computes a composite match weight for all possible pairs of records in the block; 

4. within the block, assigns each record in one file to a paired record in the other file according 

to a formula which maximizes the total composite weight for all pairs in the block; 

5. applies the Fellegi-Sunter decision procedure to determine whether a pair is a match, a 

nonmatch, or requires further review; and 

6. produces a ‘‘pointer’’ file map to the paired records in each file. 


5. ANALYSIS 


Our goals for the record check study are to estimate selected measurement error parameters 
for our samples of people, content, and times, and to assess how these errors relate both to 
each other and to variables that reflect survey design features. Our general plan is to use the 
matched data to estimate for each dichotomous participation variable: 

1. the response bias (using the survey-minus-record difference score); 

2. predictors of the response bias (using logistic or probit regression techniques or possibly 
LISREL techniques based upon matrices containing polyserial and tetrachoric coeffi- 
cients of association (Joreskog and Sorbom 1984); 

3. the response error variance (e.g., derived from regression residuals); 

4. the conditions or groups associated with very large and very small response error 
variances; and 

5. the kinds and amounts of confusion among transfer programs that contribute to the 
response errors (using covariance structure analysis procedures such as LISREL). 

(We will estimate the same parameters for reports of the amounts of money received from each 
transfer program but have not yet selected our basic estimation approach.) 
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The measurement error issues to be addressed fall into one of two categories: issues which 
apply to all time periods and issues that require comparing errors across time periods. In the 
former category are estimates of the amounts of response errors for self and proxy respondents 
or contributed by interviewers. In the latter category are the errors arising from panel surveys 
with familiar labels such as telescoping, time-in-sample bias, memory decay, rotation group 
bias, etc. — those implying that measurement errors will differ across time periods when 
everything else is held constant. To this list we add what Hill (1987) has referred to as the “‘seam”’ 
bias in longitudinal surveys, which we discuss below. 

To appreciate the applied questions we wish to address about the different time periods, 
consider Figure 2, which presents the interview and reference month calendar for one rota- 
tion group of SIPP respondents: 

The figure shows two interviews. The first takes place in early October and asks about what 
happened in September (last month), August (two months ago), July (three months ago), and 
June (four months ago). Similarly, the second interview, taken four months later, asks about 
January, December, November, and October. We refer to the transition between September 
and October as the ‘‘seam’’ because it is between the reference periods covered by the two 
interviews. 

To investigate the internal telescoping hypothesis (which asserts that events are not forgotten, 
just remembered as having happened closer to the present time), we will be testing whether 
the response bias for the early months of the reference period (June and July in Wave 1 
and October and November in Wave 2) is negative and the response bias for later 
months (August and September or December and January) is positive, and that the two biases 
sum to zero. 

We plan to test the bounded interview hypothesis, which says that events from the remote 
past are reported as happening within an unbounded reference period (June through 
September), but that this will not happen in reference periods bounded by a previous inter- 
view (here, October through January). 

To examine the hypothesis about memory decay (that the probability of forgetting an event 
increases with the passage of time), we will test whether the response bias is more negative for 
the early months of each reference period than for later months. 


Wave 1 Wave 2 
rT 
Reference 4mos. 3 mos. 2 mos. last |! 4mos. 3 mos. 2 mos. last ! 
Month ago ago ago month ago ago ago month | 
yoo 
Calendar | | 
Month JUN JUL AUG SER. fOcT NOV DEC JAN | FEB 


| 
“Seam” 


Wave 1 Interview Month 


Wave 2 Interview Month 


Figure 2. SIPP Survey Time Periods for Rotation Group 1 Showing Reference Months, Calendar 
Months, Interview Months, and Interview ‘‘Seam’’. 
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The time-in-sample and rotation group hypotheses suggest that response errors will be greater 
in the second interview than the first, after correcting for any seasonal effects. We plan to 
examine this and, if we find it to be true, test some of the ideas in the literature about why 
it may be true. Are the sample elements that survive from the first to the second interview dif- 
ferent, as Stasny and Feinberg (1985) suggest, or does the quality of the survivors’ reporting 
deteriorate, as the Neter and Waksberg (1966) conditioning hypothesis might predict? 

We don’t know yet the extent to which SIPP is experiencing these more traditional prob- 
lems of longitudinal surveys. One problem for which there is evidence, however, concerns the 
estimation of month-to-month changes in program participation (Burkhead and Coder 1985). 
Specifically, more changes in program participation take place at the ‘‘seam’’ between inter- 
views (between September and October in Figure 2) than between the months covered by any 
one interview (e.g., between June and July or July and August or August and September). The 
Census Bureau has not published monthly program participation transition estimates from 
SIPP yet because the estimates show a pattern that appears to be affected heavily by measure- 
ment error. Moore and Kasprzyk (1984) and Hill (1987) have speculated about what kinds of 
response, nonresponse, or procedural errors might be producing the pattern and which set of 
transition estimates is more accurate. By addressing the problem with administrative data, we 
hope to come much closer to a definitive explanation about the role of response and nonresponse 
errors in producing the observed pattern. 

Related, possibly, to the seam bias issue is the better-understood phenomenon that measure- 
ment error variance tends to inflate estimates of gross change or underestimate stability. Recent 
literature (e.g., Fuller and Tin 1986) suggests several possible approaches to the problem. We 
plan to begin the empirical exploration of the measurement error effects on the transition 
estimates to learn whether, for example, we can base corrections for the response errors on 
estimates from reinterviews. 

Finally, we have hinted previously at the problems that may arise in getting unbiased 
estimates of the errors if the records also contain errors. We plan, with the use of reinterview 
measures (that identify the estimate of Var e) to estimate the record error variance (Var wu). 
However, we have no plans to relax the assumption that the records are unbiased. 


6. PRELIMINARY FINDINGS 


To illustrate our approach, we examine the ‘‘seam’’ issue with data for two Government 
transfer programs in one state. Recall that the seam problem is that monthly survey reports 
about program particiption status show more frequent status changes between months covered 
by separate interviews than between other months (covered by the same interview). With the 
administrative record data we are able to begin to answer key questions concerning the quality 
of SIPP transition estimates: Are too many transitions reported at the seam? Are too few 
reported for other months? Does SIPP capture the right number of changes over the whole 
reference period but distribute them incorrectly? 

Figures 3 and 4 contain results of our initial seam bias analyses. Data for these initial analyses 
come from matched/merged SIPP and administrative record files for Aid to Families with 
Dependent Children (AFDC) and food stamps in the state of Wisconsin. 

A total of 1,632 people were eligible SIPP sample persons in Wisconsin in Wave 1 of 
the 1984 SIPP Panel. Of this total, 92 (6%) refused to report an SSN and were excluded 
both from the administrative record match and from the response error analyses. Also, 
the sample residing in Wisconsin is part of a national sample and is not necessarily representative 
of Wisconsin. 
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Figure 3. Month-to-Month AFDC Participation Transitions: Comparison of Transition Frequency at 
the Seam with the Average Frequency Within Waves | and 2, and the Overall Average Across 
All Months, for SIPP and Administrative Records. 
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Figure 4. Month-to-Month Food Stamps Participation Transitions: Comparison of Transition Fre- 
quency at the Seam with the Average Frequency Within Waves | and 2, and the Overall 
Average Across All Months, for SIPP and Administrative Records. 
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SIPP procedures assume that all sample persons identified in Wave 1 were eligible sample 
persons in the same household for all months of the Wave | reference period, and that no one 
other than those eligible at the Wave 1 interview was a household member in the preceding 
four months. Thus, the month-to-month transition estimates within Wave 1 derive from a con- 
stant respondent base of (1,632-92 =) 1,540 people. In Wave 2, however, the fluidity of 
household composition is recognized, resulting in respondent bases which vary slightly from 
one month-pair to the next — including the interview seam. In the data below the number of 
eligible persons in both ‘‘seam’’ months is 1,517; within Wave 2 the respondent bases for the 
three month-pairs are 1,522, 1,531, and 1,532. (Separate analyses (not shown here) indicate 
that the trends shown in Figures 3 and 4 are not sensitive to excluding people not present in 
all eight months of the Wave | and 2 reference periods.) Because of the small number of cases 
and the unrepresentative nature of the Wisconsin sample we do not offer inferential statistics 
for this set of illustrations. 

In the figures, the striped bars indicate the number of transitions according to administrative 
records and the empty bars indicate the number of transitions according to SIPP. If there are 
too many SIPP transitions at the seam, the empty bar should tower over the striped bar for 
the comparisons labelled ‘‘Seam.”’ If there are too few transitions reported in SIPP for the 
months covered within an interview, the empty bar should be smaller than the striped bar for the 
comparisons labelled ‘‘Wave 1”’ and ‘‘Wave 2.”’ And, if SIPP interviews yield approximately 
the right number of transition reports, the empty and striped bars should be approximately 
the same height for the comparisons labelled ‘Average Across All Months.”’ 

Figure 3 presents the average frequency of month-to-month transitions in Wisconsin AFDC 
participation within Waves 1 and 2 for the two data sources, and contrasts those figures with 
the number observed at the Wave 1/2 interview seam. The SIPP ‘‘seam bias’’ problem is quite 
apparent — the frequency of transitions at the seam is greater than the average within either 
interview. Although the absolute differences with this sample size are small, the record data 
suggest that the AFDC seam bias results from a combination of too many transitions reported 
at the seam and too few in the within-interview months. The final columns of Figure 3 
suggest, additionally, a net underreporting of AFDC transitions in SIPP, in addition to the 
time placement problem. 

The Wisconsin food stamps results are summarized in Figure 4, where the seam bias effect 
in SIPP is even clearer. Once again, the administrative record data suggest a tendency for within- 
interview transitions to be consistently underestimated with SIPP data. And, in this instance 
the contrast of survey and record data is even more clear in indicating that SIPP seam transi- 
tions are severely overestimated. Unlike the AFDC results, however, both survey and record 
contain about the same number of transitions overall, suggesting just a time placement problem 
and not a net underreporting bias. 


7. CONCLUSIONS 


After a lengthy matching and file preparation process, we are just beginning our analysis 
of this rich data set. However, with just the initial results presented here we have already shown 
how record check findings can contribute to our understanding of important measurement error 
issues — in this case, the SIPP seam bias. There are many more tests to be done and many 
hypotheses to explore before we can draw definitive conclusions about the nature of SIPP 
measurement errors and their probable causes. We are confident that the SIPP Record Check 
Study will allow us to make important advances toward understanding the sizes and forms of 
these survey errors and perhaps suggest their causes. 
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The Use of Administrative Data for Initial and 
Subsequent Profiles of Economic Entities 


COLLEEN CLARK and ROBERT LUSSIER! 


ABSTRACT 


Statistics Canada is currently rebuilding its central register of economic entities. The new register views 
each economic entity as a network of legal and operating entities whose characteristics allow for the delinea- 
tion of statistical entities. This network view, the profile, is determined through the ‘profiling’ process 
which involves contact with the economic entity. In 1986 a list of all entities in-scope for a profiling con- 
tact was required so that profiles could be obtained to initialize the new register. Administrative data 
were used to build this list. In the future, administrative data will be a source of information on changes 
that may have happened to economic entities. They may thus be used as a source of direct update or 
as a signal that a review of the structure of an entity is required. The paper begins with the objectives 
of the profiling process. The procedures for constructing the frame for the initial profiling process using 
several administrative data sources are then presented. These procedures include the application of con- 
cepts, the detection of overlap between sources, and the evaluation of data quality. Next, the role of 
administrative data in providing information on changes to business entities and in requesting profiles 
to be verified is presented. Then the results of a simulation study done to assess this role are reviewed. 
Finally, the paper concludes with a series of questions on the methodology of using administrative data 
to maintain profiles. 


KEY WORDS: Administrative data; Central register; Profile. 


1. INTRODUCTION 


Statistics Canada is in the process of reorganizing its programme of economic surveys. The 
new programme will result in an increased use of administrative data. These data will be part 
of a Central Frame Data Base (CFDB) from which economic surveys will draw samples. 

Administrative data will also be used to maintain the CFDB. This and other elements of 
the reorganization strategy are contained in Colledge and Lussier (1985). Experiences in the 
implementation of the strategy are contained in Colledge (1987). 

One of the first steps was to formulate definitions of the CFDB units. A fundamental unit 
is the business entity. A business entity is defined in Statistics Canada (1987) as ‘an economic 
transactor having the responsibility and authority to allocate resources in the production of 
goods and/or services, thereby directing and managing the receipt and disposition of income, 
the accumulation of property, the borrowing and lending of capital, and the maintenance of 
complete financial statements accounting for their responsibilities’. 

The Central Frame Data Base currently being built by Statistics Canada attempts to repre- 
sent the structure of the Canadian economy. It recognizes that this economy is dominated by 
a small number of large business entities who account for the majority of the activity within 
the economy. The CFDB is divided into two components paralleling this dichotomy. 

One component, the Integrated Portion (IP), provides coverage of the small number of large 
or otherwise important business entities, while the other, the Non-Integrated Portion (NIP), 
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Ottawa, Ontario, Canada, K1A OT6; and Robert Lussier, Business Survey Methods Division, Statistics Canada, 
11-M R.H. Coats Building, Tunney’s Pasture, Ottawa, Ontario, Canada, KIA OT6. 


146 Clark and Lussier: Administrative Data for Economic Profiles 


covers the remaining large number of smaller entities. The entities in the former component 
are more complex. Hence, the identification of those portions of the complex business entity 
that are of interest to a particular survey requires substantial effort. 

The Integrated Portion (IP) of the CFDB attempts to represent the complex structure of 
business entities through the use of an Information Model. The model consists of five struc- 
tures linked together which describe a business entity. These structures allow survey popula- 
tions to be accurately identified. The five structures are: 


i. The /egal structure which describes the legal representation of the business entity. It is com- 
prised of legal entities and their relationships of ownership and control. Examples of legal 
entities are incorporations under federal or provincial charter. 


ii. The operating structure which describes how the business entity operates and how it 
organizes its accounting system. It is comprised of operating entities. This structure 
organizes and controls the production of goods and/or services. It is an attempt to struc- 
ture the business entity as it sees itself. Examples of operating entities are divisions, profit 
centres, and plants. 


iii. The statistical structure which consists of a hierarchy of statistical entities. These entities 
are derived from the associated operating structure depending on the units within the 
operating structure for which records for a particular set of data are maintained. 


iv. The reporting structure which consists of reporting arrangements for each selected statistical 
entity by survey. The data available in the accounting system of the business entity are col- 
lected from the reporting entities. 


v. The administrative structure which contains administrative data such as income tax data 
collected from legal entities and payroll deduction account data collected from operating 
entities. 


Entities on the statistical and reporting structures are generated by Statistics Canada for 
the purpose of collecting, editing, estimating, and tabulating economic data. The entities on 
the other three structures are externally defined. 

The complex process of determining the boundaries of the business entity and of delineating 
its five IP structures and their associated links is termed ‘profiling’. This network view of the 
business entity is the ‘profile’. The data to construct a profile are obtained through a contact 
with the business entity or some component of it. The entity’s legal and operating structures 
as well as some administrative structure data items are obtained, or, reviewed and updated 
during the interview. The statistical structure is then generated or updated automatically from 
the new operating structure. Finally, default reporting entities are created for new selected 
statistical entities using selected fields from the legal, operating or administrative structures. 
These entities may subsequently be updated as a result of the first survey contact with the 
respondents or of special arrangements negotiated with the respondents. 

The type of profiling contact used depends on the entity’s complexity and any special 
reporting arrangements. The most complex and important entities will receive a personal visit 
from either Head Office or Regional Office personnel. The remaining entities will be contacted 
by telephone. Entities will be contacted about once every two years, or more often, depending 
on how quickly their structures change. 

Cyclical profiling, whereby business entities are periodically contacted, is one method that 
will be used to keep the IP of the CFDB current. A survey feedback process and data from 
administrative sources will also be used. 

The design and construction of the CFDB is taking place over three years culminating in 
a data base that will be available for integration into survey programs. At implementation stage, 
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most of the data in the Integrated Portion of the CFDB should have come from a profiling 
process that began in April 1986. However, no single list of business entities in-scope for a 
profile was available in April 1986. 

Administrative data played a major role in initiating the profiling process. It was used as 
a starting point to construct the current Statistics Canada view of the business entity. A list 
of business entities in-scope for an initial profile was assembled from administrative data 
sources. Section 2 describes how this was accomplished. Section 2.1 gives the frame 
requirements. A description of the data sources used to build the frame follows in Section 2.2. 
Section 2.3 shows how the frame unit was constructed and how the various data sources were 
combined to build the frame. 

Section 3 describes how administrative data will be used to detect potential changes in a 
business entity and then to initiate the maintenance profiling process. The results of a 
simulation study done to quantify the proposed use of administrative data sources are 
then presented. The paper concludes with a discussion of several issues that this study 
has raised. 


2. USE OF ADMINISTRATIVE DATA FOR INITIAL PROFILING 


2.1 Frame Requirements 


The first step in building the frame for initial profiling was to define the frame unit. The 
ideal one would be the business entity. However this entity was not available either internally 
or externally to Statistics Canada. The units available to us were essentially legal entities. It 
was necessary, then, to group legal entities to approximate business entities. The frame unit 
was defined as a grouping of legal entities subject to the following constraints: 


i. The definition of the business entity implies that it covers all legal entities linked through 
ownership where ownership is defined as the owning more than 50% of the voting rights 
of a legal entity. The grouping of legal entities through this ownership rule is restricted 
to one level of foreign ownership outside Canada. 


ii. There has to be a single Canadian legal entity that owns all other Canadian legal entities 
in the business entity. This is necessary because profiling contacts with the business entity 
could only be made in Canada. 


The next step was to determine which frame units would comprise the frame and what data 
was required for each. The frame from which business entities would be selected for an initial 
profiling contact and from which the initial picture of the business entity would be generated 
would contain all business entities in-scope for a contact. 

Business entities are in-scope for a profiling contact if they qualify to be members of the 
Integrated Portion of the CFDB. Membership is determined by criteria applied to the legal 
structure that describes the legal representation of the business entity. 

Legal structures can become members of the Integrated Portion in one of two ways. First, 
if the structure consists of only one legal entity then the legal entity is part of the Integrated 
Portion if its revenue during its fiscal year of interest is above a prespecified value. This 
prespecified value depends on the legal entity’s major industry and the location of its head 
office. Alternatively, if the legal structure consists of more than one legal entity then the legal 
structure is part of the Integrated Portion if at least one of the legal entities in the structure 
has a revenue above its appropriate prespecified value. 
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Therefore, in order to determine which business entities are in-scope, the following infor- 
mation was required for every legal entity: 


i. Relationships of ownership between legal entities. 
ii. Revenue in the fiscal year of interest, primary industry, and head office location. 


For business entities that qualify to be on the frame and, hence, to receive an initial pro- 
filing contact, information was required to select and contact the entity. The following was 
required to select the entity: 


i. All industries in which the business entity was involved so that the Wholesale and/or Retail 
industries could be contacted first. The surveys of these industries required a set of statistical 
entities that had been generated from a profiling contact before other surveys did. 


li. The number of physical locations of all business entities that consist of one legal entity 
or that consist of two legal entities of which the owner is foreign. This data item deter- 
mined the type of profiling contact that would be made as either a telephone contact by 
Regional Office staff or a personal visit by Regional or Head Office staff. 


iii. The province in which the ultimate Canadian corporate ownership was based. The prov- 
ince was used to distribute the workload of making the profiling contacts to regional offices 
according to their capacities. 


In order to contact the business entities, name and address were required for the legal entity 
at the top (excluding foreign owners) of the business entity. Contact data and any special 
reporting arrangements that surveys had recently used would be desirable. 


2.2 Data Sources 


The data sources which could be used were restricted, primarily, by the frame coverage 
requirements. This restriction eliminated sample lists and many industry specific lists such as 
survey frames. Only data sources that were lists of all legal entities potentially in-scope for a 
profiling contact that carried, at least, some of the required data items could be considered. 
The data sources that could at least be partially integrated by computer were: 


i. The Inter-Corporate Ownership Database (ICO) which is a list of all legal entities operating 
in Canada that are owned by either foreign or Canadian legal entities and their owners. 
The coverage of foreign legal entities is required to determine the ultimate owner. 


ii. The Current Business Register (BR) which is primarily a list of all legal entities that are 
employers. The number of physical locations of a legal entity, contact data (address and 
reporting arrangements) used by surveys, and the industries in which the legal entity operates 
are available here. 


iii. The Corporation Tax Base (CORP) which is a list of all legal entities that filed a corporate 
tax return with Revenue Canada, Taxation in a given year. The primary industry, the loca- 
tion of the Head Office, and revenue for the fiscal year are carried on this data source. 


iv. The Individual Tax Base (IND) which is a list of all individuals who filed a tax return with 
Revenue Canada, Taxation in a given year. Individuals who report self-employed income 
on their return are legal entities of interest to Statistics Canada economic surveys. Primary 
industry data and contact data are available from this tax base for each individual reporting 
self-employed revenue as is his/her revenue from self-employment. 


Both of the tax base data sources (CORP and IND) are administrative data files. Administra- 
tive data received monthly from Revenue Canada, Taxation regarding an employer’s payroll 
deductions are used to update the BR. The ICO data source is a census survey response file. 
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None of these data sources provides complete coverage and all the required data items. 
Rather, coverage could only be obtained by combining these data sources. The same is true 
for some required data items while for the rest more than one source could provide them. The 
strategy used to combine these data sources to obtain the best coverage and data quality is 
presented in the next section. 

A fifth source, the Quarterly Survey of Financial Statements provided information on legal 
entities that prepare consolidated financial statements. This source was used in manually 
refining the business entities on the frame. 


2.3 Frame Creation Procedures 


The challenge in creating the frame for initial profiling contacts lay in integrating four data 
sources that had each been designed for different purposes and had never been integrated to 
this extent before. This situation is common to users of administrative data. The task was even 
more complex because this was the first time many concepts established for the CFDB were 
applied. 

The constraints of limited time and resources forced the project team to make some assump- 
tions when creating the frame. However, the assumptions were justifiable since the picture used 
on the frame would be corrected through the profiling process. A simple description of the 
procedures used is presented in this section. 

There were three steps in the frame creation process, each of which is discussed in the 
following sections. 


i. Construct a list of all potential frame units; 
ii. Determine which are in-scope; and 


iii. Acquire selection and contact data. 


2.3.1 Create Potential Frame Units 


The frame unit was constructed by grouping legal entities in the following manner to create 
business entities. The legal entities were first grouped into legal structures. One legal structure 
consisted of that set of legal entities related via ownership of more than 50%. Relationships 
involving foreign legal entities were accepted only if the foreign legal entity owned or was owned 
by a Canadian legal entity. When a foreign entity owned more than one Canadian entity, the 
legal structure was divided into as many business entities as there were Canadian entities directly 
owned by the foreign entity. In this way, a profiling contact would be made with the ultimate 
Canadian owner of each resulting business entity. Examples are provided in Figure 1. 

Individuals who reported self-employed income were considered as a legal structure con- 
taining only one legal entity. The ownership of corporations by individuals as well as relation- 
ships of joint venture between corporations were not considered in constructing business 
entities. 

Therefore, we can think of the set of business entities in-scope for an initial profiling con- 
tact as two mutually exclusive groups. The first group consists of legal entities that represent 
individuals who report self-employed income. The Individual (IND) tax base contains a list 
of all potential frame units in this group. 

The second group consists of legal entities that represent corporations operating in Canada. 
The Inter-Corporate Ownership (ICO) data source was manipulated to provide a list of cor- 
porations that belonged to legal structures containing more than one legal entity. A list of all 
legal entities that are not owned by any other legal entity was obtained from the Corporation 
tax base after elimination of those legal entities that were owned by other legal entities or were 
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EXAMPLE 1 EXAMPLE 2 
LEGAL BUSINESS LEGAL BUSINESS 
STRUCTURE ENTITY STRUCTURE ENTITY 
A A A A 
(CAN) (CAN) oan (USA) (USA) 
B Cc 
(CAN (CAN) (CAN aia (ANS ae (CAN) (CAN) 
D D D D 
(CAN) (CAN) (USA) (USA) 


Figure 1. Defining Business Entities 


owners themselves. That is, it was necessary to match the ICO source and the CORP Tax base 
to identify the overlap between them. Legal entities that appeared on both sources could thus 
be identified to ensure that they would only appear once on the frame. Linkage between the 
two sources was not straight- forward and involved a clerical process because a common iden- 
tification number was often not available. 


2.3.2 Determine In-Scope Frame Units 


The data required to determine if individuals reporting self-employed income were in- scope 
was on the IND tax base. It was a simple step to determine if a legal entity was above its 
appropriate prespecified cut-off. 

The situation was more complex for corporations. The linkage achieved between ICO and 
CORP provided the data required to apply the cut-off rule. However, about 20% of the cor- 
porations on ICO could not be linked to CORP. In these cases an assumption was made which 
led to an overestimation of the set of business entities in-scope for an initial profile. It was 
assumed that legal structures which contained at least one unlinked corporation satisfied the 
frame inclusion conditions. Otherwise, legal structures were frame members if at least one cor- 
poration satisfied the cut-off rule. 
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2.3.3 Acquire Selection and Contact Data 


The result of the previous step was a proxy list of all business entities in-scope for an initial 
profiling contact. The data required for selection and contact described in Section 2.1 that are 
not already on the frame were available from the BR. The frame and the BR overlap because 
a majority of the frame units representing corporations and a smaller proportion of the frame 
units representing individuals are employers. Linkage between the frame and the BR was 
required so that data from the BR could be added to the frame for units found on both sources. 
That is, it was necessary to detect duplication between the two sources. 

It was even more difficult to link these two sources than it had been to link the ICO and 
CORP sources. This was due not only to the frequent absence of common identification 
numbers as in the ICO-CORP case but also because the BR resembles a business entity’s 
operating structure more than its legal structure. The name and address from the BR were used 
for linking when no common identification number was available. However, the names and 
addresses on the BR often refer to ‘trade’ or ‘operating’ locations which are sometimes dif- 
ferent from the ‘legal’ names and addresses on the ICO and CORP sources. When this occurred 
it was difficult to establish a link and hence eliminate duplication. 

There were some frame units for which no link to the BR was achieved either because they 
were non-employers and therefore not on the BR or the linkage procedures could not estab- 
lish the link. In these cases subsequent stages in the initial profiling process were amended to 
accommodate the frame limitations. Contact data of a lesser quality were taken from the tax 
base. The selection criteria were changed to reflect the absence of data on industrial breakdown 
and physical locations for these legal entities. 


CURRENT 
BUSINESS 
REGISTER 


CORPORATIONS 
TAX BASE 


INDIVIDUAL 
TAX BASE INTERCORPORATE 
OWNERSHIP 


DATA BASE 


Figure 2. Frame for Initial Profiling 
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When a legal entity was involved in only one industry, the primary industry was available 
from both the tax bases and the BR. It was necessary, then, to reconcile this common data 
item when they were different. In this case the BR industry was used since it was considered 
more reliable. 

A pictorial representation (not to scale) of the resulting frame is shown in Figure 2. 


2.3.4 Evaluate the Frame 


The quality of the resulting frame was assessed by three projects. First, the consistency of 
the frame with the specifications for creating it was verified. 

The second project involved comparing various distributions of the legal entities on the frame 
with the same distributions produced from an independent simulation of the Integrated Por- 
tion. The distributions did not differ significantly. 

Lastly the frame was assessed by comparing it with the BR. A sample of 30 of the larger 
units in the BR was matched to the frame for initial profiling. All of the entities were found 
but with great difficulty because the two sources use different concepts. 


2.4 Conclusion 


The frame strategy just described was based on some simplistic assumptions regarding cov- 
erage, data quality, and the way in which business entities operate. ‘Shortcuts’ were often used 
to satisfy the frame requirements. It was felt that this approach was justified because of the 
role of the frame as a provider of initial pictures of business entities that would be updated 
during the profiling process. The implications of making these assumptions are discussed in 
this section. 

The population of business entities in-scope for an initial profiling contact may contain 
duplicates and out-of-scope units. If so, then more profiling contacts than necessary will be 
made. This would increase Statistics Canada’s production costs. It would unduly burden the 
respondent with duplicate requests. Finally, the image of Statistics Canada could be adversely 
affected. 

The population may be underestimated. Nevertheless, the missing units will be profiled at 
a later date. This would delay the introduction of new large units into the Integrated Portion 
of the CFDB. The missing units would be covered by the Non-Integrated Portion in the interim 
rather than the Integrated Portion. 

Inaccurate selection and/or contact data could complicate or delay contact until accurate 
data could be found. The consequence in these cases is also an inaccurate CFDB until the pro- 
file is completed. 

These experiences demonstrate the complications introduced when administrative data are 
used. They also illustrate the care that must be taken in ensuring the compatibility of 
administrative data with one’s requirements. Examples were provided of the types of ensuing 
compromises that must be made when such a compatibility cannot reasonably be reached. 


3. USE OF ADMINISTRATIVE DATA IN SUBSEQUENT 
MAINTENANCE PROFILES 


3.1 Cyclical and Reaction Profiling 


There will be two types of subsequent maintenance profiling, namely cyclical and reaction 
profiling. Each of these is explained below. 
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Cyclical profiling is the process that will ensure that all business entities in the profile popula- 
tion get reprofiled within a certain period of time. It is expected according to current budget 
forecasts that this period of time will be two years. Time elapsed since the business entity’s 
last profile will be the factor that determines eligibility for cyclical profiling. Other factors will 
be taken into account to prioritize the eligible units within cyclical profiling. 

Reaction profiling is the process that will profile a business entity as a result of informa- 
tion through a source other than profiling that changes may have occurred to that business 
entity and that the statistical image of the business entity on the register may not be valid any 
longer. Reaction profiling will keep the CFDB more up-to-date than if only the cyclical pro- 
filing mechanism were used. Some of the sources of information on changes are the various 
files of administrative data received regularly at Statistics Canada. 


3.2 Sources of Administrative Data That Can be Used 


The three sources of administrative data that Statistics Canada can use to update its cen- 
tral register that are discussed in this paper are: 


- the Individual Tax Base; 
- the Corporation Tax Base; and 
- data on payroll deduction accounts captured by the tax authorities. 


Generally, individuals and corporations file a single tax return for a reference year. How- 
ever, it is possible to have more than one return for a reference year if, for example, a cor- 
poration changed its fiscal year end with the approval of the tax authorities. Nevertheless, one 
can say that tax returns are an annual source of changes. 

The receipt of the tax bases at Statistics Canada does not occur at a single point in time. 
In fact, Statistics Canada receives files of tax data regularly for a reference year over a period 
of two years. Thus, one could perform monthly updates to the register from tax data but each 
register record would generally be updated only once a year. 

On the other hand, an employer is generally expected to send remittances for his payroll 
deduction accounts on a monthly basis. In turn, Statistics Canada receives a file of payroll 
deduction account data once a month. Thus, monthly updates can be made to the register from 
payroll deduction account data and each register record can in theory be modified every month. 

Note that there are other sources of administrative data that could be used. They are not 
discussed in this paper because they are not obtained on a universe basis or on a regular basis. 
They are nevertheless worth mentioning. These are: 


— limited information on corporations that have not filed a tax return but are believed to 
be active, captured by the tax authorities; 


- additional data captured from a sample of tax returns by Statistics Canada; and 


- data on a tax authority form filled out by employers when they request a payroll deduc- 
tion account, captured by Statistics Canada. 


3.3 Signals of Change 


Signals of change were developed from the administrative sources described in the previous 
section. These signals identify administrative records for which changes to their associated 
statistical entities may have occurred. They also inform the register that reaction profiling may 
be desirable for these entities to keep the register up-to-date. 
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Table 1 


Signals by Administrative Source 


Administrative Number Of Examples 

Source Distinct Signals 

Annual Individual Tax 50 Change from single province of 

Returns taxation to multiple jurisdiction 

Annual Corporation Tax 49 Start of a joint venture 

Returns 

Monthly Payroll Deduction 38 New account with descriptions 

Accounts in the name that identify a 
corporation 


The signals are administrative source dependent. For each of the three sources listed in 3.2 
the signals consist of comparison tests between new data received for an administrative record 
and the last data received for the same record from the same source. These tests may involve 
a single field or a group of fields and may be conditional on a single field or several fields. These 
comparison tests attempt to identify real world events that have an impact on the statistical 
entities and not only on the administrative entities. Remember that the statistical entities exist 
for the purpose of economic statistical programs and often are completely different from the 
legal-administrative reality. Therefore, these comparison tests should optimize the detection of 
changes in the administrative data that reflect a change in the statistical entities. As an example, 
change of ownership of a manufacturing plant may mean the deathing of an administrative 
record and the birthing of a new one. On the statistical entities, it may however mean no change 
as the same establishment with its capabilities to provide the required data may still exist. 

If the frame was updated directly from the changes noted in the administrative records, the 
consequence would be a high incidence of apparent deaths and births in the statistical entities 
and arisk of incomplete or duplicated coverage. Thus there is a requirement to contact respon- 
dents, or at least to perform in-house research using all available documentation, to find out 
for signaled administrative records what happened to the statistical entities. The ‘‘translation’”’ 
process is not trivial at all and its resolution constitutes the purpose of reaction profiling. 

The number of signals that were determined from each source together with some signal 
examples are presented in Table 1. One should however note the following points in studying 
the data on the number of signals. Some signals are very refined while others are not. It was 
often decided to split an original signal into mutually exclusive sub-signals because it was felt 
that it may be more informative in determining the action to take from the signal. The most 
trivial example concerns the Payroll Deduction Accounts. Eighteen of the 40 signals repre- 
sent changes in the estimated number of employees covered by the account. The 18 signals 
distinguish between increases and decreases in the estimated number and the magnitude for 
each of them. It was thought that such a breakdown would be informative to prioritize the 
clerical work. Nevertheless one could consider these signals as one. 

It is expected that even though tax returns are processed regularly, a given return will gen- 
erally generate signals at most once per reference year while a given payroll deduction 
accountmay generate a signal or signals every month. What is of more interest therefore is not 
the number of signals defined per source but the number of records that are identified by these 
signals. This would give an idea of the amount of clerical resources that will have to be invested 
to update the register from administrative sources. A simulation study was thus undertaken 
to address this issue. 


Survey Methodology, June 1989 ioOD 


3.4 Simulation Study 


The simulation study consisted of applying the signals previously described to the following 
populations: 


- the individual tax returns for fiscal periods that ended in 1984 to detect changes that had 
taken place during these periods; 


- the corporation tax returns for fiscal periods that ended in 1984 to detect changes that 
happened during these periods; 


- the payroll deduction account of the beginning of October 1985 to detect changes that 
had occurred since the beginning of September 1985. 


The results of the simulation study are presented in Table 2. The following observations 
can be made on the results: 


— There are a very large number of tax returns that generate signals: only about one eighth 
of the individual tax returns and one fifth of the corporation tax returns do not generate 
any signals. 


— There are 8,258 payroll deduction accounts that generated signals for a one month period. 
If one supposes uniformity of the payroll deduction account signals over months, there 
would be almost 100,000 accounts signaled in a year. Note that it is likely that accounts 
would be signaled in more than one month and therefore there would be duplicates if one 
cumulated the signals. 


- If all records signaled in a year are added, it gives the grand total of 244,269 signaled 
records. However, it is obvious that signals are duplicated between the administrative 
sources. For example, a change to the legal name of a business could be found on the 
tax return as well as on each of its payroll deduction accounts. 


3.5 Questions Raised 


The results of the simulation study as well as an examination of the role of the signals raise 
a certain number of issues with respect to the profiling activities. 
Six of these issues are presented bpose. 


Table 2 
Results of Simulation Study 


Administrative Number In 
Source The Profile Number Percentage 
Population Signaled Signaled 
Individual Tax Returns 72,190 63,446 87.9 
Corporation Tax Returns 102,688 81,727 79.6 


Payroll Deduction Accounts 134,973 8,258 6.1 
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3.5.1 Performance of Signals in Detecting Change(s) to Statistical Entities 


The signals will attempt to flag legal and/or operating entities involved in real world events 
that have an impact on the statistical entities. An update will then be necessary on the central 
register to maintain the quality of the statistical products. Are the signals really reflecting real 
world events that affect the statistical entities or are there some that have no impact? If some 
are useless, work will be generated for no purpose. 

A small-scale survey was conducted in 1986 to determine the usefulness of the signals with 
respect to the detection of changes to the statistical entities. However, for various reasons, the 
only signals that could be used were those of the simulation study. They refer to changes between 
tax returns of taxation years 1983 and 1984. Thus the time lag between the reference period 
of the signals and the survey period (1986) gave recalling difficulties to the respondents. This 
led to the inclusion of events which took place after the period as well as the omission of events 
which did occur in the reference period. The survey was therefore inconclusive and no other 
attempt has been made since then. 


3.5.2 Repetitiveness of Signals 


Signals will be received over time and from different independent sources. The tax returns 
in particular suffer from noticeable time delays. As a given signal is received, the CFDB may 
have already been updated to reflect the real world event behind the signal. This update may 
have been the result of processing a signaled record from another source or of conducting 
cyclical profiling or of incorporating feedback received from surveys. Therefore, signals cannot 
be treated independently of the CFDB to decide to perform a reaction profile. However, how 
should a signal be checked against the CFDB to see if the CFDB was already updated? As an 
example, if a large increase in revenue is flagged on a corporation tax return, how should one 
check if the CFDB was already updated to reflect the real world event behind this increase when 
one does not know the real world event behind it? 


3.5.3 Omission of Signals of Change 


Similarly, some records will not get signaled. Will the absence of signals definitively mean 
that no real world event occurred that need the statistical structure to be updated? Should other 
signals be developed to cover omissions? Again, the survey previously mentioned was 
inconclusive in answering these questions. 


3.5.4 Availability of Resources to Handle Signaled Records 


As the simulation study showed, a large number of records will be signaled. These 
will require manual work. It is likely, that there will not be sufficient resources to per- 
form all this work. How should the total amount of resources to be devoted to reaction 
profiles be determined and how should this total amount be used to handle the signaled 
records? If constraints on resources demand that some signals be ignored, how will these 
be determined? 


3.5.5 Response Burden 


The results of the simulation study suggest that businesses will be contacted more often than 
every second year to check for frame changes other than through regular survey activity. This 
will increase response burden. Can a trade-off be established between increase in response 
burden and out-datedness of the register? What should this trade-off be? 
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3.5.6 Role of Cyclical Profiling 


The large amount of records signaled by the tax returns in the simulation study raises a ques- 
tion about the usefulness of cyclical profiling. The number of records subject to cyclical pro- 
filing and not to reaction profiling can be deduced to be very small. First, suppose the results 
of the simulation study in terms of numbers hold for a second year. Then suppose the records 
signaled in the second year are not all the same in the first year but that there are new records 
signaled and that there are last year’s records not signaled the second year. Then it can be safely 
assumed that the number of records which will not get a signal over two years will be very small. 
There may be only a few records left which will not be signaled on either one or the other year. 
This will in fact represent the maximum target population for cyclical profiling. Will it be 
necessary to perform a profile for these entities, knowing that they are not signaled by the 
Payroll Deduction Accounts nor by the tax returns? 


4. CONCLUSION 


Section 2 has shown how administrative data were used to build a frame for initial profiling. 
Administrative data offered extensive coverage. However, it was also seen that conceptual dif- 
ferences between one’s requirements and administrative data can lead to complications requiring 
simplifying assumptions and compromises. 

The resulting frame supported the initial profiling of all business entities except the most 
complex ones. In these cases the approximation given by the frame could not be accepted. 
Rather, extensive research was conducted on each business entity using elements such as public 
annual reports and survey responses. 

The frame also played an important role in initializing the CFDB. It was used along with 
the Business Register to identify the members of the Integrated Portion. 

The method by which administrative data will be used to initiate a maintenance profile was 
described in Section 3. Signals of change will be derived from various administrative sources 
and will generate requests to verify profiles. Many issues were raised in this respect. These issues 
are being addressed by the various design teams responsible for implementing the CFDB update 
strategy. A solution being investigated to solve some issues is to prioritize signals depending 
for example on the length of time since the entity was last profiled. Another solution is to 
develop a self-learning process. Experience will dictate which signals are useful and should be 
kept. Therefore, substantial work is still required before the process stabilizes in production. 
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GUIDELINES FOR MANUSCRIPTS 


Before having a manuscript typed for submission, please examine a recent issue (Vol. 10, 
No. 2 and onward) of Survey Methodology as a guide and note particularly the following 


points: 

| Layout 

1.1 Manuscripts should be typed on white bond paper of standard size (8% x 11 inch), 
one side only, entirely double spaced with margins of at least 1% inches on all sides. 

1.2 The manuscripts should be divided into numbered sections with suitable verbal titles. 

1.3. The name and address of each author should be given as a footnote on the first page 
of the manuscript. 

1.4 Acknowledgements should appear at the end of the text. 

1.5 Any appendix should be placed after the acknowledgements but before the list of 
references. 

2 Abstract 
The manuscript should begin with an abstract consisting of one paragraph followed 
by three to six key words. Avoid mathematical expressions in the abstract. 

3. Style 

3.1 Avoid footnotes, abbreviations, and acronyms. 

3.2 Mathematical symbols will be italicized unless specified otherwise except for functional 
symbols such as “‘exp(-)” and ‘“‘log(-)’”, etc. 

3.3. Short formulae should be left in the text but everything in the text should fit in single 
spacing. Long and important equations should be separated from the text and numbered 
consecutively with arabic numerals on the right if they are to be referred to later. 

3.4 Write fractions in the text using a solidus. 

3.5 Distinguish between ambiguous characters, (e.g., w, w; 0, O, 0; 1, 1). 

3.6 Italics are used for emphasis. Indicate italics by underlining on the manuscript. 

4. Figures and Tables 

4.1 All figures and tables should be numbered consecutively with arabic numerals, with 
titles which are as nearly self explanatory as possible, at the bottom for figures and 
at the top for tables. 

4.2 They should be put on separate pages with an indication of their appropriate place- 
ment in the text. (Normally they should appear near where they are first referred to). 

as References 

5.1 References in the text should be cited with authors’ names and the date of publication. 
If part of a reference is cited, indicate after the reference, e.g., Cochran (1977, p. 164). 

5.2 The list of references at the end of the manuscript should be arranged alphabetically 


and for the same author chronologically. Distinguish publications of the same author 
in the same year by attaching a, b, c to the year of publication. Journal titles should 
not be abbreviated. Follow the same format used in recent issues. 
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In This Issue 


The risks involved in using standard statistical methods for the analysis of data from surveys 
with complex designs are becoming well-known. The special topic section in this issue contains 
three papers which provide guidance for the analysis of categorical data from such surveys. Tim 
Holt’s efforts were instrumental in putting this section together. 

The paper by Rao, Kumar and Roberts, which is the first discussion paper published in Survey 
Methodology, reviews developments in the analysis of cross-classified categorical data, extends 
them, and applies them to data from two large, complex surveys. The authors also briefly discuss 
computational issues. Comments by Fay, Skinner and Molina and a reply by Rao, et al. follow 
the paper. 

Thomas describes a Monte Carlo study used to investigate several methods of obtaining 
simultaneous confidence intervals for proportions under a two-stage clustered design. He shows 
that some methods behave poorly, with actual coverage rates quite different from the nominal 
ones. Thomas concludes with guidelines on the choice of methods to use in practice. 

The final paper in the section on data analysis for complex surveys, by Morel, deals with logistic 
regression. Using the results of a Monte Carlo study, he shows that for small samples, a modified 
Taylorization method for estimating a covariance matrix results in smaller biases than the usual 
delta method. 

The bibliography by Nathan on randomized response which appeared in the previous issue 
of Survey Methodology attests to the large amount of research which has been devoted to the 
subject. In this issue, Franklin develops another approach to the randomized response model 
for sampling from dichotomous populations. The model is general in that it permits the use of 
randomization from a continuous distribution and multiple trials per respondent. Special atten- 
tion is given to the case of randomization using the normal distribution function. 

MacGibbon and Tomberlin examine the problem of small area estimation with complex survey 
designs. Their empirical Bayes estimator is a compromise between the highly variable but unbiased 
classical estimator and the more stable but potentially highly biased synthetic estimator. 

A method of updating a PPSWOR sample which attempts to retain the same sample of pri- 
mary sampling units is presented by Sunter. The method differs from earlier ones proposed by 
Kish and Scott (1971) and Fellegi (1963) in that it is valid for any sample size and does not require 
enumeration of all possible samples. The method is of particular importance for multistage survey 
samples which must be updated, but for which the cost of introducing new PSUs may be high. 

Revenue Canada tax files and Family Allowance files are used in Canada to provide popula- 
tion estimates for provinces in non-census years. Verma and Raby examine the consistency of 
the estimates derived from these two sources. A comparison with the 1986 Census counts is also 
made. 

Swanson presents a method of obtaining confidence intervals for post-censal population 
estimates. He shows that a Wilcoxon test can be used to determine if a change in model, due 
to post-censal structural changes, is required. Using empirical data, Swanson shows that ignoring 
such a change leads to confidence intervals whose coverage is lower than expected. 
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Analysis of Sample Survey Data Involving 
Categorical Response Variables: 
Methods and Software 


J.N.K. RAO, S. KUMAR, and G. ROBERTS! 


ABSTRACT 


During the past 10 years or so, rapid progress has been made in the development of statistical methods 
of analysing survey data that take account of the complexity of survey design. This progress has been 
particularly evident in the analysis of cross-classified count data. Developments in this area have included 
weighted least squares estimation of generalized linear models and associated Wald tests of goodness 
of fit and subhypotheses, corrections to standard chi-squared or likelihood ratio tests under loglinear 
models or logistic regression models involving a binary response variable, and jackknifed chisquared 
tests. This paper illustrates the use of various extensions of these methods on data from complex surveys. 
The method of Scott, Rao and Thomas (1989) for weighted regression involving singular covariance 
matrices is applied to data from the Canada Health Survey (1978-79). Methods for logistic regression 
models are extended to Box-Cox models involving power transformations of cell odds ratios, and their 
use is illustrated on data from the Canadian Labour Force Survey. Methods for testing equality of 
parameters in two logistic regression models, corresponding to two time points, are applied to data from 
the Canadian Labour Force Survey. Finally, a general class of polytomous response models is studied, 
and corrected chi-squared tests are applied to data from the Canada Health Survey (1978-79). Software 
to implement these methods using the SAS facilities on a main frame computer is briefly described. 


KEY WORDS: Corrections to chi-squared tests; Logistic regression; Power transformations; Wald tests; 
Weighted least squares. 


1. INTRODUCTION 


Standard statistical methods, based on the assumption of independent identically distributed 
observations, are being used extensively by researchers in the social and health sciences, and 
in other subject matter areas. These methods have also been implemented in standard statistical 
packages, including SPSSX, BMDP, SAS and GLIM. In practice, however, much data are 
obtained from complex sample surveys involving clustering and stratification, so that the 
application of standard methods to these data without some adjustment for survey design can 
lead to erroneous inferences. In particular, standard errors of parameter estimates and 
associated confidence intervals can be seriously understated if the complexity of the sample 
design is ignored in the analysis of data. Moreover, the actual type I error rates of tests of 
hypotheses can be much bigger than the nominal levels. Standard exploratory data analyses, 
e.g., residual analysis to detect model deviations, are also affected. Kish and Frankel (1974) 
and others drew attention to some of these problems with standard methods, and emphasized 
the need for new methods that take proper account of the complexity of survey design. During 
the past 10 years or so, rapid progress has been made in the development of such methods, 
particularly for analysing cross-classified count data. This paper will focus on the analysis of 


! J.N.K. Rao, Department of Mathematics and Statistics, Carleton University, Ottawa, Ontario; S. Kumar and 
G. Roberts, Social Surveys Methods Division, Statistics Canada, Ottawa, Ontario. 


162 Rao, Kumar and Roberts: Analysis of Categorical Survey Data 


count data, but it should be noted that important results on other types of analyses have also 
been obtained: Regression analysis (Fuller 1975; Nathan and Holt 1980; Pfefferman and 
Nathan 1981; Scott and Holt 1982), principal component analysis (Skinner, Holmes and 
Smith 1986), factor analysis (Fuller 1986), logistic regression involving continuous covariates 
(Binder 1983). 


Rao and Scott (1984) have made a systematic study of the impact of survey design on stan- 
dard Pearson chi-squared or likelihood ratio tests for multiway tables of counts, under hierar- 
chical log-linear models. They have also obtained simple first order corrections to standard 
tests which can be computed from published tables that include ‘‘design effects’’ for cell 
estimates and marginal totals, thus facilitating secondary analyses from published reports (see 
also Gross 1984; Bedrick 1983; Rao and Scott 1987). These first order corrections take account 
of the design in the sense that the actual type I error rates of tests based on the corrected statistics 
are closer to nominal levels, compared to the standard tests which could have greatly inflated 
type I error rates. More accurate second order corrections, based on the Satterthwaite approx- 
imation to a weighted sum of independent x? variables, were also developed by Rao and Scott 
(1984), but these tests require the knowledge of a full estimated covariance matrix of cell 
estimates. Alternative methods that take account of the survey design include the Wald statistics 
based on weighted least squares (Koch, Freeman and Freeman 1975), and the jackknifed chi- 
squared tests (Fay 1985), all requiring either the full estimated covariance matrix or access to 
cluster-level data. Fay (1985) and Thomas and Rao (1987) have shown that the Wald statistic, 
although asymptotically correct, can become highly unstable as the number of cells in the 
multiway table increases and the number of sample clusters decreases, leading to unacceptably 
high type I error rates compared to the nominal level. On the other hand, Fay’s jackknife tests 
and the Rao-Scott corrections have performed well under quite general conditions. In some 
cases, the instability in the Wald statistic may be remedied by collapsing the table according 
to eigenvectors associated with the nonnegligible eigenvalues of the estimated covariance matrix 
adjusted for singularities caused by linear constraints on the probabilities, as proposed by Singh 
(1985); see also Singh and Kumar (1986). 


Roberts, Rao and Kumar (1987) assumed a logistic regression model for the cell (domain) 
proportions associated with a binary response variable, and obtained first order corrections 
to standard chi-squared and likelihood ratio tests of goodness-of-fit and nested hypotheses. 
Simple upper bounds to first order corrections, depending only on the design effects of cell 
response proportions, were also obtained to facilitate secondary analyses from published tables. 
Scott (1986) proposed an alternative method which uses standard tests on transformed data 
derived from the original data and the cell design effects. Roberts, Rao and Kumar (1987) also 
provided second order corrections to standard tests, but these require access to a full estimated 
covariance matrix of cell response proportions. Diagnostics for detecting outliers and influential 
points were developed as well, again taking the survey design into account. 


The primary purpose of this paper is to present various extensions of the previous methods 
and illustrate their use on data from large-scale surveys, including the Canada Health Survey 
(1978-1979) and the Canadian Labour Force Survey. It is assumed, throughout the paper, that 
the user has access to a full estimated covariance matrix of cell estimates. In Section 2, weighted 
least squares (WLS) estimators of the parameters of generalized linear models having singular 
covariance matrices, caused by linear constraints on the probabilities (or proportions), are 
presented. Associated Wald tests of goodness-of-fit and of subhypotheses are also provided. 
A smoothed version of the WLS estimators, and associated Wald tests of subhypotheses are 
given as well. These methods should be used only when the number of cells in a table is small 
and/or the number of sample clusters in the survey design is relatively large. 
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The methods for logistic regression models are extended, in Section 3, to Box-Cox models 
involving power transformations of cell odds ratios. These models, which include the logistic 
regression model as a special case, could provide significantly better fits than the logistic regres- 
sion models, as demonstrated by Guerrero and Johnson (1982) in the context of binomial 
proportions. 

Methods for testing equality of parameters in two logit models, corresponding to two dif- 
ferent time periods, are given in Section 4. If the hypothesis of equality is accepted, one could 
obtain ‘‘smoothed”’ estimates of cell proportions for the current period that are more efficient 
than the corresponding smoothed estimates based only on the current period data. 

Section 5 gives an extension of the type of results obtained for logistic regression models 
to a general class of polytomous response models. The special case of McCullagh’s (1980) 
ordered response model is studied in detail. 

Finally, an account of the software for implementing the above methodology is given in 
Section 6. 


2. WEIGHTED LEAST SQUARES ESTIMATORS 
AND WALD TESTS 


The approach of Koch, Freeman and Freeman (1975) is designed to estimate the parameters 
of generalized linear models of the form g*(p) = X*8*, using a sample estimate, , of the 
population cell probabilities denoted by a 7-vector p, and a consistent estimate of cov(p) = 
V,, (say). In this method, the asymptotic covariance matrix of the u-vector g* (p) is assumed 
to be nonsingular (u < T); however, many models, including the traditional loglinear model, 
are of the form g(p) = X®8, where g(p) is a T-vector with a singular asymptotic covariance 
matrix, and Xisa 7 x rfullrank matrix of known constants. It is possible to reduce the latter 
models to the nonsingular form g*(p) = X**, as done by Grizzle and Williams (1972) for 
the loglinear model, but Scott, Rao and Thomas (1989) have developed the following unified 
approach for singular models, by appealing to the optimal theory for linear models having 
singular covariance matrices. 

The cell probabilities p and # are subject to linear constraints of the form K’p = am and 
K’p = am, where K isa 7 x L full rank matrix of known constants and z is an L-vector of 
known constants 7; (L < T). Asa result, the covariance matrix of will be singular. For 
example, in the case of stratified sampling with complex sample designs within strata, we can 
write Ki" 17.) 17h = 7/7) (=p he, L) and ps! (pads ipye2aaion pr ieeipes) with 
Dy = (nj/n) py, where pj is the j-th category probability within the i-th stratum ( Yj6; = 1; 
i=1,...,L1;j = 1, ..., m), n;is the sample size from the i-th stratum, Yn; = n, 1,,isa 
m-vector of 1’s, J; is the identity matrix of order L and ® denotes the Kronecker product. 

Assume that X6 can be written as X98) + X,G,, where Xp is a T x L matrix such that 
K’H~'X is nonsingular and where H = (dg/dp)’ isthe T x T matrix of partial derivatives 
of g(p). In particular, Xo can be taken as K if the constraint matrix K is included in X, as 
frequently assumed. Since restrictions on p imply constraints on the parameters 8, By can be 
determined exactly from the constraints, for a given 6). 


Weighted least squares estimators 


The model may be written as 


& = g(p) = XB +6 (2.1) 
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where 6 is the error vector with Plim 6 = 0, and has a singular asymptotic covariance matrix 
V, = HV,H’ which is consistently estimated as Ke = A V, 1’, assuming that V, is a consis- 
tent estimator of V,. Here A = H(p). Scott, Rab and Thome’ (1989) detheal an asymp- 
totically best linear ainbineea estimator (ABLUE) of 6 as 


By) = (X{MX,)~'X{Mé, (2.2) 
where 

M = (V, + XoX$)7' (2.3) 
is a nonsingular generalized inverse of VA and 


X, = UW — XXjM)X}. (2.4) 


A consistent estimator of the asymptotic covariance matrix of B, is given by 


est cov(B;) = = (X{MX,)7 (255) 
Wald tests 
Letting B= (X’MX) 1x’ = (B Bi )’, a Wald test of goodness of fit of the model 
(2.1) is given by 
W = (g — XB)'M(é — XB) (2.6) 


which is distributed asymptotically as a x” variable with T — r degrees of freedom (d.f.). The 
model is considered tenable at the a-level if W > x>_,(a), the upper a-point of x? with 
PPT Gate 

Given the model (2.1), tests of linear hypotheses on the model parameters 6, can also be 
obtained. A Wald test of the linear hypothesis C)6, = c, is given by 


W, = (C,B, — cy)’ [Cyest cov (B,)C{] ~'(CiB; — ¢1) (2.7) 


which is distributed asymptotically as a x” variable with A d.f., where C; isah x (r — L) 
full rank matrix of known constants (h < r — L), and c, is a h-vector of known constants. 
The hypothesis is rejected at the a-level if W, > x(a), the upper e-point of x? with h d.f. 
Note that Bp should not be included in the linear hypothesis since it is fixed by the design 
constraints K’p = K’g~!(X6) = 


Smoothed version of ABLUE and associated Wald tests 
We can also obtain a smoothed version of ABLUE of 6,, say Gf, using iteration, as follows: 
Brs1 = By + (X'M,X)~'X’M,H,(B — p,), t = 0,1,2,... (2.8) 
with starting values My = M, By = (X’MX)7'!X'Mé = 8, Hy = H(8) and Po = p(B). 


Further, M, = (Vy, + XoXé) 7! with Vy. = H,V,H?, H, = H(6,) and p, = P(B,), t= 1. 
At convergence, ie get B* = (63’,Bf’)’ as the solution of the following equations: 


X’M(B)H(B) (6 — p(B)) = 9. (2.9) 
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Equations (2.9) reduce to quasilikelihood equations (McCullagh 1983) when V, is proportional 
to V(p), a known function of p. Here, the dependence on 6 is made explicit by writing 
p = p(B), H = H(B) and M = V, + XoXo = M(B). The smoothed estimate 6* also 
satisfies the constraints K’p = K’g~!(XB) = x, unlike ®. The asymptotic covariance 
matrices of @/ and B are identical, but Gj might perform better in small samples. 

Given the model (2.1), an alternate Wald test of the hypothesis C,6, = c, is given by 


Wt = (CiBT — 1)’ [C, est cov (Bf)C{] ~* (CBF — 4) (2.10) 
which is distributed asymptotically as a x? with h d.f., where 
est cov (6%) = (X)M*X*) 7, (2.11) 


and Xf = [I — X)XjM*]X,,M* = (Vz + XoX$)~' with Vi = H*V,H*’ and H* = 
A(B*). 


Example 


The previous results were applied to a two-way table from the Canada Health Survey 
(1978-79). This survey was designed to provide reliable information on the health of Canadians. 
The information collected was made up of an interview component for the whole sample and 
a physical measures component for a subsample. A complex multistage design involving 
stratification and clustering was employed, and the estimates of cell totals or proportions were 
subjected to post-stratification on age-sex, to improve their efficiency. The reader is referred 
to Hidiroglou and Rao (1987) for a description of the survey and the procedures used for 
estimating cell counts, proportions, and their estimated variances and covariances. For the 
physical measures component, a collapsed stratum technique for variance estimation was 
employed since a single primary sampling unit was selected in some of the strata. 

Table 1 gives the estimated proportions, f;;, derived from the physical measures component 
in a cross-classification of fitness level (recommended = 1, minimal acceptable = 2, below 
acceptable or screened out = 3) and type of cigarette smoker (regular = 1, occasional = 2, 
never = 3). The estimated covariance matrix of the f;,, Ves can be obtained from the authors. 

Since both the variables in Table 1 are ordinal, we considered the following loglinear model 
with linear x linear interaction: 


log pj =ut U4 (i) ata U2(j) + ¥(; = v) (w; = Ww), a 1253 ii = 1,2,3 (2:12) 


Table 1 


Estimated Cell Proportions in a3 x 3 Table (Canada Level): 
Type of Cigarette Smoker x Fitness Level (Sample Size n = 2505) 


Ages 15-64 
Type of Fitness Level 
cigarette smokers 1 ) 3 
1 0.22005 0.14951 0.16998 
2 0.02301 0.00962 0.01146 


3 0.20329 0.09933 0.11374 
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subject to side constraints )ju1(;),) = Lj42yj) = 0, where v; and w, are known scores with 
means 7 and w respectively. For simplicity, equidistant scores were taken: u; = 1,2,3; 
v; = 1,2,3. The model (2.12) is of the form g(p) = Xoo + X86, with g;(p) = logp,, 
Xo = K = 1g, a9 X 1 vector of 1’s, Bp = W, By = (U1 (1)9!1(2):42(1)M2(2)Y) ', and 


hear] pad (yr pt Sin V6 Maple 8) (TSS 7 
(vagal apa ot a | i oly lee 

| Wh eh ek ORs 1 Ol aed 
OQ: deret71:.2,30.4 dln <1 0 iorves-t 

ioe (Oe 1 PO; 10) > FON 1 0 1 


Noting that H = diag(fj',i = 1,2,3;/ = 1,2,3), the Wald test of goodness-of-fit of the 
model (2.12) can be computed from (2.6), using the proportions f; in Table 1 and the 
estimated covariance matrix, eg We obtain 


W = 3.59 


which is not significant at the 5% level compared to x7_,(0.05) = x$(0.05) = 7.81 (note that 
T = 9,r = 6). The Wald statistic W is likely to be stable in this example since the number 
of cells T( = 9) is small relative to the number of sample clusters (= 50). 

We can also conduct a test of independence, i.e. y = 0, given the model (2.12), using W, 
given by (2.7) or W}, based on the smoothed estimates Gj, given by (2.10). Noting that 
Cay a0, sar nk any at 0. RW EOD tain. 


W, = 8.23, Wy = 8.75, 


both larger than x7 (0.01) = 6.63, the upper 1% point of x? with 1 d.f. The nested hypothesis 
of independence is therefore not tenable. 

Accepting the model (2.12), we obtain the following values of weighted least squares 
estimates, Bi, and smoothed estimates, 6*: 


A 


Br = 0.912, 1.550;501339;—07255,.— 0.086)" 
B35 = —2.665, Bf = (0.917, —1.568,0.344, —0.262,0.087)’. 


The estimate 6* can also be used to produce smoothed estimates of the pj, pj; = pj (B*), 
which satisfy the constraint } )p,(6*) = 1. 


3. BOX-COX TRANSFORMATION MODELS 


Logistic regression models are extensively used for the analysis of variation in the estimated 
proportions associated with a binary response variable. Suppose that the population of interest 
is partitioned into J cells according to the levels of one or more factors. Let P; be the popula- 
tion response proportion in the i-th cell. Then a logistic regression model for the proportions 
P; = F;(8) is given by 


log{F;/(1 ad F;)} 7 ayB, i= L, ore © (3.1) 
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where x; = (X1;,-..,Xsj)’ is an s-vector of known constants derived from the factor levels 
with x;; = 1, and @ is an s-vector of unknown parameters. 

Guerrero and Johnson (1982) extended the applicability of logistic regression models by 
introducing an additional parameter, \, through a Box-Cox power transformation of the odds 
ratios F;/(1 — F;). Their model is given by 


vA) = (F/0 — Fy = x/68, = 1,...00 (3.2) 
where 6 and x; are as in (3.1) and 


(F/I — F)} = oe = i) if\ = 0 

Nery Cor) — 1] if X 0: 
The model (3.2) includes as a special case (\ = 0) the logistic regression model (3.1). Guerrero 
and Johnson (1982) applied this model to data from the National Survey of Household Income 
and Expenditures in Mexico to explain the variation in female participation in the Mexican 
labour force. They found that a value of \ = —6.63 provided a significantly better fit than 
the logit model (X = 0), the values of the standard chi-squared statistic being 4.8 (7 d.f.) and 
12.8 (8 d.f.) respectively. However, they applied standard methods for binomial proportions, 
ignoring the survey design. 


Pseudo MLE 


In this section, the methods of Roberts, Rao and Kumar (1987) for the logistic regression 
model are extended to the power transformation model (3.2). Due to difficulties in obtaining 
appropriate likelihood functions for general sample designs, we use ‘‘pseudo’”? maximum 
likelihood estimates, 8 and \, obtained from the product binomial likelihood equations for 
6 and 2X by replacing the simple response proportion 7;/n; with the corresponding survey 
estimate P; of P;, and n;/n with the corresponding survey estimate W; of the domain propor- 
tion W;. Here r; is the number of ‘‘successes’’ in a sample of size n; from the i-th cell, and 
n = Y.n;. See Guerrero and Johnson (1982), for the product binomial likelihood equations. 
The pseudo maximum likelihood estimates (m.l.e.), 6’ = (B/ x), can be obtained iteratively 
by a quasi-Newton procedure, as in Guerrero and Johnson (1982). The fitted response pro- 
portions are given by F = F,(6). 

Let Vp be the estimated covariance matrix of the survey estimates P = (P,,...,P;)’, and 
let 


B = D(F)—'D(1 — F)~!(AF/06)’. (3.3) 


Here D(F) = diag(F;,i = 1,...,1), D(1 — F) = diag(1 — F,,i = 1,...,/) and 
(OF/06)’ isthe J x (s + 1) matrix of partial derivatives OF; /08; and dF; /0 evaluated at 0: 


OF;/88; = xjF? (1/Q;)'*'” 
OF;/d\ = F?(Q;logQ; — Q; + 1)~?(1/Q;)'*", (3.4) 


where Q; = 1 + AY ;x;8;. The estimated asymptotic covariance matrix of 6, taking account 
of the survey design, is then given by (see Roberts 1985) 


est cov(6) = (B’AB) ~!(B’D(W) VpD(W)B) (B’ AB), (3.5) 
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where A = diag(W;F,(1 — F;);i = 1,...,/) and D(W) = diag(W;,i = 1,...,J). 
It is also of interest to find the standard errors of the residuals R; = P; — F; since the 
standardized residuals R;/s.e.(R;) can be used to detect any outlying cell proportions. The 


estimated asymptotic covariance matrix of the vector of residuals R = (R,,...,R,)’ is given 
by 


estcov(R) = Aestcov(9)A’ = Vp, (3.6) 
where 


Az=I— D(F)DU — FyBCB’AB) BR D(W). 


The square root of the diagonal elements, Vin r» Of (3.6) provide the estimated standard errors 
of the: Res ii=s If nm cle 


Corrections to Standard Tests 


The standard chi-squared and likelihood ratio tests of goodness-of-fit of the model (3.2) 
are given by 


I 
a sy (P; — F)?W,/{ FU — F;)} Cr 
i=1 
and 


I 
G* = 2n )) W[Pilog(Pi/Fi) + (1 — Pilog((i — P/U — Fi)}], G8) 
i=1 


respectively, where the term in [] of (3.8) equals — log(1 — F;) at P; = Oand — logF, 
at P, ='1. 

Under product binomial sampling, it is well-known that both X* and G? are asymptotically 
identically distributed as a x? variable with J — s — 1 d.f., but for general sample designs 
this result is no longer valid. In fact, X* (or G*) is asymptotically distributed as a weighted 
sum, )6,W;,, of independent x? variables, W,, each with 1 d.f., where the weights 5, 
(kK = 1,...,2 — s — 1) can be interpreted as ‘‘generalized design effects’? (see Roberts 
1985). Under product binomial sampling, 6, = 1 for all k, and $6,W, reduces to x? with 
Ff-s-—1d.f. 

A first-order correction to X? (or G’) is obtained by treating X? = X°/é. or G2 = G?/6. 
as x? with J — s — 1d.f., where 


(—s- 16 = Yi & =n Yl ViaeWisFi — Fi} (3.9) 
i=l 
and Vii pr is the estimated variance of the i-th residual R;. 


A more accurate, second order correction to X? (or G*), based on the Satterthwaite 
approximation to ¥6,W,, is obtained by treating 


2 2 


Xx G 
X§ = —‘ or G§ = —* asx? with (J -—s—1)/(1 + @) df. (3.10) 
PHe l)thaa 
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Here @? = ¥(6, — 6.)?/{(1 — s — 1)6?} is the squared coefficient of variation of the 6; 
which can be computed, without evaluating the individual weights 6 ;, from (3.9) and from 


Ti 
YY &=DY VY Pir(aw) (nw) / FAG — fd - fi}, (3.11) 


i=1 /=1 


where Vj; z is the (i,/)-th element of Vg given by (3.6). 


Nested hypotheses, given the model (3.2), can also be tested by correcting the standard tests 
for nested hypotheses, but we omit this topic for simplicity (see Roberts 1985 and Kumar and 
Rao 1985 for details). It is simpler, however, to use Wald tests based on the estimates 6 and 
the associated estimated asymptotic covariance matrix. 


Example 


The previous method was applied to data from the monthly Canadian Labour Force Survey 
(October, 1980). The Labour Force Survey design employs multi-stage cluster sampling with 
two stages in the self-representing urban areas and three or four stages in the non-self- 
representing areas in each province. A detailed description of the sample design and associated 
estimation procedures for the Labour Force Survey is given in Statistics Canada (1977). 

The sample from the Labour Force Survey, for the present example, consisted of males aged 
15-64 who were in the labour force and not full-time students. Two factors, age and educa- 
tion, were chosen to explain the unemployment rates via a Box-Cox transformation model. 
Age-group levels were formed by dividing the interval [15,64] into ten groups with the j-th 
age group being the interval [10 + 5/,14 + 5/7] fory = 1,...,10 and then using the mid- 
point of each interval, A; = 12 + 5j, as the value of age for all persons in that age group. 
Similarly, the levels of education, E,, were formed by assigning to each person a value based 
on the median years of school resulting in the following six levels: 7, 10, 12, 13, 14 and 16. 
The resultant age by education cross-classification provides a two-way table of J = 60 survey 
estimates, P, x, of employment rates P;,. The estimated covariance matrix Vp was based on 
more than 450 sample clusters. 

We considered the following transformation model for P;, = Fj,(@) involving linear and 
quadratic age effects and linear education effect: 


Vid) = (Fjix/U — Fix ™ 
= By + BA; + B.A}? + B3Ey, i = 1,...,10,k = 1,...,6. (3.12) 


Table 2 contains the pseudo m.l.e. of 9 = (8o,8;,82,83,)’ and associated standard errors, 
and the test statistics X*, G*, X% and G for testing the goodness-of-fit of the model (3.12). 
The corresponding values under the logistic regression model (A = 0) are also given for 
comparison. 

It is clear from Table 2 that the value of X? (or G”) is essentially equal to the correspon- 
ding value under the logistic regression model. Thus in the present example the transforma- 
tion model provides no improvement in the fit over the logistic regression model. This is also 
clear from the value of \ (= 0.016) which is not significantly different from \ = 0 when 
compared to its standard error (= 0.085). The estimates of regression coefficients are essen- 
tially equal under the two models, but the standard errors of the B; under the Box-Cox model 
are much larger than the corresponding standard errors under the logistic regression model, 
due to the large standard error associated with \ and the fact that the 8; depend on i. 
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Table 2 


Pseudo MLE of the Parameters (8’,\), their Standard Errors and 
Test Statistics Under the Transformation Model and under 
the Corresponding Logistic Regression Model (A = 0) 


Transformation Model Logistic Regression Model 
estimate Se. estimate $6: 
Bo — 3.28 0.975 —3.10 0.247 
By 0.219 0.0468 0.211 0.013 
Bo — 0.00227 0.00049 — 0.00218 0.00017 
B3 0.1579 0.0385 0.1509 0.0115 
IN 0.016 0.085 = = 
Test Statistics 
value hic value dat 
xe 99.6 55 99.8 56 
Gy 102.6 56 102.5 56 
ne 40.7 39.2 23.4 24.2 
Gi 42.0 39.2 23.9 24.2 
X%(0.05) 54.6 55 47.7 56 
G%(0.05) 56.4 55 48.9 56 


If the survey design is ignored and the value of X? (or G’) is referred to x 95(55) = 73.3, 
the upper 5% point of x* with J — s — 1 = 55d.f., we would reject the model (3.12). On 
the other hand, the value of X% (or G%) when adjusted to refer to x3.05(55), denoted as 
X% (0.05) (or G2(0.05)) in Table 2, is not significant at the 5% level, indicating that the model 
provides a good fit to the data, Px. 

Box and Cox (1982) and Hinkley and Runger (1984) argued that statistical inference about 
B should proceed with the scale determined by the estimate \ regarded as fixed. Thus, the 
estimated covariance matrix of 6 is determined from (3.5) by replacing 0F/06 by 0F/d8 in the 
expression for B (equation (3.3)). For our example, this argument would suggest that we can 
take \ = Oand use the estimates of 8 and associated standard errors (or estimated covariance 
matrix) under the logistic regression model, given in Table 2. 


4. TESTING EQUALITY OF LOGISTIC REGRESSION MODELS 


Structural changes between two time periods may be detected through tests of equality of 
parameters in the corresponding models. Such tests for standard linear regression models have 
been developed extensively in the econometric literature (see e.g., Amemiya 1985, Sec. 1.5.3). 

In this section, corrected chi-squared and likelihood ratio tests of equality of parameters 
in two logistic regression models, corresponding to two specified time periods, are obtained. 
If the hypothesis of equality is tenable, then ‘‘smoothed’’ (i.e., fitted) estimates of cell pro- 
portions for the current period can be obtained by combining the data for the two periods. 
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These estimates are more efficient than the corresponding smoothed estimate based only on 
the current period data. The methodology is applied to data from the October 1980 and October 
1981 Canadian Labour Force Survey, to study year-to-year structural changes. Note that the 
data for October 1980 has already been used, in Section 3, to illustrate the fitting of Box-Cox 
power transformation models, and it was found that a logistic regression model involving linear 
and quadratic age effects and linear education effect provides a good fit to the data. 

Let P,; be the population response proportion in the i-th cell for the period t( = 1,2). Then 
a logistic regression model for the proportions P,; = F;(6,;) = F;; 1s given by 


log{Fi,i/Q — Fa} =x, © =1,...c5¢ = 1,2 (4.1) 


where x; is an s-vector of known constants derived from the factor levels, as in (3.1), and 
G, is an s-vector of unknown parameters for period ¢. We are interested in testing the com- 
posite hypothesis G; = 8,(= GB) to study structural changes between the two time periods. 
If the hypothesis is accepted, ‘‘smoothed”’ estimates of the proportions P,; for the current 
period (tf = 2) can be obtained as F;(B) where 8 is the pseudo m.l.e. of the common 
parameter 6. 


Pseudo MLE 


Let P,; and P,; (i = 1,...,/) be the survey estimates based on sample sizes n, and n, 
respectively. Extending the notation in Section 3, ‘‘pseudo’’ maximum likelihood estimates, 
B,, are obtained from the product binomial likelihood equations for 6, by replacing the simple 
response proportions 7;;//,; With the corresponding survey estimates P,; of P;; and n,;/n, with 
the corresponding survey estimates W,; of the domain proportions W;; , thus yielding 


X'D(W,)F, = X'D(W,)P;, t = 1,2 (4.2) 


where F, = F (B;) is the vector of fitted response proportions for period t, D(W,) = 
diag (W,;,i = 1,...,I), and X’ = (x,,...,x,). The estimates 8, are obtained iteratively 
by a quasi-Newton procedure. 


Under the hypothesis 8, = 8,( = 8G), the pseudo maximum likelihood estimates, B mare 
obtained by iteration from the following pseudo likelihood equations: 


X'D(W.)F = (n/n) X'D(W,)P, + (m/n)X’D(W,)Py, (4.3) 


where D(W,) = (n,/n)D(W,) + (n,/n)D(W),F — F(£) is the vector of fitted response 
proportions or smoothed estimates of cell proportions for the current period, and 
ny = Ny — i. 


Let Vp be the estimated covariance matrix of (Pj ,P)’ partitioned as 


ae | Vip an 
Pia = = 5 
Voip Vrop 


Then the estimated covariance matrix of smoothed estimates F is given by 


est cov(F) = BVpB’, (4.4) 
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where 
B = D(W.)~1AX(X’AX)~!X’[(m/n) D(W,), (m/n)D(W,)|_—— (4.5) 


and 


A.= diag(W,F,(1 —F))) i= oneegde 


_If the residuals are defined as R, = F, — F, then the estimated covariance matrix of 
(R;,R3)’ is given by 


Vp a [ie we] = AV,A™ 


Voir Voor (4.6) 
Here 
ier ts a 
Ay Ay 

with 

A; = D(H) AX| (x°4,x) “1X DU) ~ mex Ax) xD) |, 
and 

Ai = — D(W.) AX (X' AX) 7 1X? [Dov = “pm, p= 
where 


A, = diag(W;,,F,(1 — F),i = 1,...,1). 


Corrections to Standard Tests 


The standard chi-squared and likelihood ratio tests of the nested hypothesis 8; = 8, given 
the model (4.1), are given by 


DG ae, Cre a. Gs (4.8) 
and 
G? = G?>+ G3; (4.9) 
where 
I ~~ A ~~ 
RPP (Fi Fy W Fy} PS,2 (4.10) 
i=1 
and 


" 
G? = 2n, Wa | Fulost Fu lh) + (1 — F,)logf(1.— Fy) /C — A]. t = 1,2. 
i=] 
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A first order correction to X? (or G’) is obtained by treating X2 = X’/é. or G2 = G’/6. 
as x” with s d.f., where 


iL I Fe ig 
$6. = m YY) Vir ii) Mi FA — Fi} + m2 YS Vapi) Wi] (Fi -— Fi} (4.12) 
Se i=1 


and V,,p(ij) is the (i,j )th element of V,,.2. A more accurate, second order correction to X7 
(or G”), based on the Satterthwaite approximation, is obtained by treating 


Sebdokinc Ge 
Xt = or Gt= as x? with s/(1 + a7) df. (4.13) 
1+@ 1+@ 
Here G* = ( Y§_,6% — s6”)/s5? which can be computed from (4.12) and the following for- 


mula for yé: 


Viir(ii) Wi, 
F.(1 — F)(1 — F) 


ty 


Voor (ij) WoW, 
BF — FU —- F) 


ne a . x Vion (i) Wy Wy (4.14) 
1Nz Si Sia Se SA, . 
: FE C11) Gla itn) 


where Vj2(i/) is the (i,)-th element of Vip. 


Example 


The previous method was applied to data from the October 1980 and October 1981 Cana- 
dian Labour Survey, to study year-to-year structural changes. 

The logistic regression model involving linear and quadratic age effects and linear educa- 
tion effect provided a good fit to data from both periods with the following estimates of 6;,: 


Bi: {—3.08, 0.211, —0.00218, 0.1505} 
Bo: { —3.05, 0.179, —0.00169, 0.1707}, 


where log{ Fyjx/(1 — Fijx)} = Bio + Br Aj + BA} + BisEe.j = 1,...,10;k = 1,...,6 
and F;;, is the fitted employment rate in the (j,k)-th cell for period t. One cell was omitted 
in the fitting since the domain sample size n; is zero for the current period. 

Turning to the test of the hypothesis 8; = 8, given the logistic regression models, we 
obtained the following values of X*, G*, X?, G? and X%, G: 


> iad 90 dana, Coke! aa Ga 


G*=42.2 G2=246 Gt = 24.4. 
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Also s/(1 + 42) = 4/(1.0089) = 3.965 = 4. By referring X% or G% to x6,05(4) = 9.49, the 
upper 5% point of x” with 4 d.f., we reject the hypothesis 8; = (2 at the 5% level, indicating 
significant year-to-year structural changes for the month of October. The data for the two time 
periods, therefore, should not be pooled to get smoothed estimates of unemployment rates, 
1 —- F, x, for the current period. 


5. POLYTOMOUS RESPONSE MODELS 


A variety of models has been suggested in the literature when the response variable is 
polytomous. The variety of models reflects, in part, the different scales of measurement possible 
for polytomous response variables, unlike binary response variables. In the main, there are 
nominal responses where any permutation of the response categories is equally valid, and 
ordinal responses where there is a natural ordering of the response categories. 

Suppose that the population of interest is partitioned into I cells (or domains) according 
to the levels of one or more factors. Let P; (;) be the population proportion in the i” cell having 
the j" response (j = 1,...,J + 1) sothat y/*/ P,(i) = 1 (i = 1,...,/). Thena general 
polytomous response model for the proportions P; (7) is given by 


Py) sa a (G) ie als pel ej el ae (5.1) 


where @ is an r-vector of unknown parameters (r < JJ) and Fj (@) is a function of known 
form. In the nominal case, Haberman (1982) and others proposed the following model: the 
‘‘multinomial logits” logP; (i) — jt) log P)(i)(J + 1) 7! are assumed to be unknown 
linear functions of x;, the s-vector of known constants derived from the factor levels, /.e., 


J+1 
Fi; (8) = exp (x18) | Oo exp (x7 Bx), i= | Berar & a 7 Ih aces eee (5.2) 
k=1 


with Y 8, = 0. Because of the latter constraint on the 6,, (5.2) may be expressed as 


J J 
Fi (0) = exp(x8))/| De exp (x7 8,) + II exp (— =) : 
k=1 


k=1 


Picse beet ad 0) GU seen (5.3) 


Note that (5.3) reduces to the usual logistic regression model in the special case of binary 
response. 


In the ordinal case, a simple model which also has the feature of being invariant under the 
grouping of response categories is given by (McCullagh 1980) 


los lCurmét lncctaGh ail aavity a J28 e femmilemaca Lalstues sas aaah (5.4) 
where Cji, = Y4=1 Px) denotes the j cumulative probability in the i” domain, and 0’ = 
(11,...,¥,,8’). To express (5.4) in the form (5.1), we note that P; = L~'C;, where P; = 
(Piciys ++ Priy)’s Cy = (Cyiys--+»Cyiy)’ and L~isaJ x J nonsingular matrix with 1 


in the diagonal, —1 in the (i + 1,7)” position (i < J) and 0 elsewhere. 
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Pseudo MLE 


As before, we use pseudo m.l.e., 6 obtained from the product multinomial likelihood equa- 
tions for 6 by replacing the simple response proportions 1, /n; with the corresponding survey 
estimates P; (i), and n;/n with the corresponding peal estimate W; of the domain propor- 
tion W;. Here nj is the number of units with the j response in a sample of size n; from the 

i domain and n = ¥.n,. The fitted response Pear are then given by F = F (6) = 
(Eee es) s , where F; = = (Porte (aye and Fi; = = F;(6). a 7 ? 

Let Vp be the estimated covariance matrix of the survey estimates P = (Pi(1),---»Py 1), 

a oy ay B wit np)’, and M = (0F/06)’, the J x rmatrix of partial derivatives OF jj / 00% 

calculated at 6. Also, let O; = diag(F;) — F,F/ and O = diag(O;,i = 1, wije ane 
expressions for the partial derivatives 0F;;/00, for the models (5.3) and (5.4) a are given in 
Roberts (1985). The estimated asymptotic seorteanies matrix of 6, taking account of the survey 
design, is then given by (see Roberts 1985). 


estcov(6) = (M’YM) —!(M'VVpV'M) (M’'VM)™, (5.5) 


where V = (D(W) @ I)O7!and D(W) = diag(W;,i = 1,...,1). In the special case of 
product multinomial sampling, Vp = V~'/n and (5.5) reduces to (M’VM) ~'/n. 

The vector of residuals, R = P — F, is also of interest, since it may be useful in detecting 
model deviations. The estimated asymptotic covariance matrix of R is given by 


estcov(R) = GVpG’ (5.6) 
where G = I — M(M’VM) —!M’V. 


Corrections to standard tests 


For simplicity, we consider only the Pearson chi-squared test of goodness-of-fit of the model 
(5.1). It is given by 


If Jet 
Sn YT OS) hae Fa) PP y- GH) 
i=1 j=1 


Under independent multinomial samp he in each of the domains, it is well-known that X? is 
asymptotically distributed as a x 2 variable with JJ — rd.f. 

To test the nested hypothesis 6. = 0, given the model (5.1), let 6, be the Beene meson 
6, and F be the corresponding vector of fitted response proportions, where 0’ = (6;,63), 9; 
isq x land6,isu x 1 (q + u =r). The Pearson chi-squared test of the ack hypothesis 
is then given by 


J+1 c 


X2(2|1) =n 3 W, » (FB, — Fy)?/Fy (5.8) 


which is asymptotically distributed as x? with u d.f. under independent multinomial sampling 
in each of the domains. However, for a general sample design, X” and X?(2|1) are both 
asymptotically distributed as weighted sums of independent x’ variables, each with 1 d.f., 
where the weights can be interpreted as ‘‘generalized design effects’’ of particular linear 
transformations of P (Roberts 1985). 
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A first-order correction to X7(2|1) is obtained by treating 
X2(2|1) = X2(2|1)/6.(2|1) as x?.with uw d.f., (5.9) 


where 6.(2|1) is obtained by replacing 0’ by (6; ,0’) and Vp by Vpin the following definition 
for 6.(2|1): 


noo(2f1je= ar §;(2|1)°= tr D(2]1). (5.10) 
t=1 


Here, tr denotes the trace operator and D(2|1) is a generalized design effects matrix given by 

D(2|1) = (AZVH2)~'(A3VVpV'AD), (5.11) 
where Vp is the covariance matrix of P, V = (D(W) ® 1)Q™', OQ is the block 
diagonal matrix with Q; = diag(F;) — F;F/, i= 1,...,J, F; = F,(0), and A, = 
[J — M, (M/VM,)~! M/V]M>, where M, = (0F/00,)’ and M, = (8F/06)’. 


A more accurate, second order correction to X? (2|1), based on the Satterthwaite approx- 
imation, is obtained by treating 


AA(Q|1) = X22 1) /1 1 G21) has x with e/a cd (5.12) 


Here @(2|1)? is obtained by replacing 6 by (6; ,0’) in the following definition of a(2|1)?: 


a(2\1)? = { yy §;(2|1)? — v8. (211)? } fs. 211) (5.13) 
| 
where 
Ne §;(2|1)?.= trD(2/1)?. (5.14) 
t=) 


The corrections to goodness-of-fit test X? are obtained as special cases of (5.9) and (5.12) 
by treating the model as nested within a saturated model (i.e., a model where the unknown 
parameter @ is of length //). 


Example 


The previous methods were applied to data from the Canada Health Survey (1978-79). A 
brief description of the survey is provided in Section 2. 

The data set examined consisted of the estimated counts of females aged 20-64 cross-classified 
by frequency of breast self-examination (with the 3 categories: monthly, quarterly, less often 
or never), education (with the 3 categories: secondary or less, some post-secondary, post- 
secondary) and age (with the 3 categories: 20-24, 25-44, 45-64). 

The frequency of breast self-examination was considered to be the response variable, while 
education and age were taken as explanatory variables, so that the number of responses, J + 1, 
equalled 3 and the number of domains, /, was 9. Both response and explanatory variables are 
ordered. 
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Table 3 
Survey Estimates of Cumulated Probabilities 


Age Education Ci (ik) C2 ik) 
kee 20-24 < Secondary 25 .49 
Ke 2 < Post-Secondary pd) 41 
k= 3 = Post-Secondary “3 .47 
L= 2, k-="1 25-44 < Secondary pe) .50 
k= 2 < Post-Secondary WH .44 
Kien. = Post-Secondary .26 .44 
i= 3,k = 1 45-64 < Secondary 28 | 
Kis Z < Post-Secondary 24 .62 
k = 3 = Post-Secondary 29 56 
Table 4 
Statistics for Testing Goodness of Fit and Nested Hypotheses 
Goodness of Fit Nested Hypothesis 
(Age & Education) (Age only) 
x? o7r7 7.1 
Xe 21.6 3.8 
ie 18.5* 3.7% 
é. 1.75 1.9 
a? 0.83 0.1 


* The Satterthwaite statistic has been adjusted to refer to the same x? value as X2. 


The following model for the cumulated probabilities of the type described in equation (5.4), 
was considered: 


log{C,(ik)/(1 — C,(tk))} =», + Ba, +e UY =1,2;1 = 1,253 Khemels tae aslo) 


where Cj (ik) is the j“* cumulated probability for the i’ ” age group and k“ education group. 
As well, a; = A; — A, where 4; is the midpoint of the i age interval, and e; is the effect of 
the k education group (Ye, = 0), ignoring the order of the education categories. Table 3 
contains the survey estimates of the cumulated proportions. Table 4 contains the test statistics 
X*, X? and X% for testing the goodness of fit of (5.15) and also for testing the nested 
hypothesis of no education effect, e, = 0 fork = 1,2. 

First, considering the goodness of fit of (5.15), if the survey design is ignored and the value 
of X? is referred to x4 95(13) = 22.4, the upper 5% point of x? with IJ — 5 = 13 d.f., we 
would reject the model. On the other hand, the value of X 2 or the value of X 2 when adjusted 
to refer to x2 os (13), is not significant at the 5% level, indicating that the model provides a 
good fit to the data. 

For focune of the nested hypothesis, the value of X, 2. or the value of X% when adjusted to 
refer to x4.95(2) = 5.99 is not significant at the 5% level, indicating that the nested hypothesis 
of no education effect is tenable. 
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6. SOFTWARE 


Implementation of the methodology of the previous sections requires two stages of com- 
putation — calculation of a vector of proportions, along with its estimated covariance matrix, 
and then calculation of model estimates, test statistics and their adjustments. 

Surveys like the Canada Health Survey and the Labour Force Survey, from which examples 
have been presented, have complex designs and large data bases. Because of these two factors, 
calculation of covariance matrices was done on a mainframe computer. Custom SAS and 
Fortran programs were used for this purpose. 

Computations required for the fitting and testing of goodness-of-fit models and sub- 
hypotheses were done either on the mainframe computer using SAS (and the MATRIX 
procedure in particular), or on a microcomputer using the GAUSS programming package. 

These programs are available to other analysts at Statistics Canada. 
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COMMENT 


ROBERT E. FAY’ 


The authors have made an excellent contribution to the literature on the analysis of data from 
complex samples. By examining in turn four different models for categorical data: i) a log-linear 
model for across-classification; ii) a modification of the approach of Box and Cox to the trans- 
formation of binary data; iii) a problem of inference about parameters of a logistic regression 
model; and iv) a polytomous response model, the authors present solutions to important indi- 
vidual problems and illustrate the ways in which these flexible approaches to inference can be 
extended to other models for categorical data from complex samples. The applications are con- 
nected by an underlying theory, much of it previously appearing in Rao and Scott (1984), but this 
paper usefully presents in greater detail the implications of the general theory for specific models. 

An omission from the paper is understandable but worth noting: for each model illustrated 
in the paper, replication provides an alternative strategy that, at times, may also be more con- 
venient. In particular, the replication theory is complete for each of the applications, i), ii), 
and iv), to cross-classified data. In each case, tests of overall fit and comparisons of nested 
models can be assessed with the jackknifed chi-square test (Fay 1985) and standard errors for 
the parameters obtained through replication. 

Replication also can provide standard errors and covariances for parameters of logistic 
regression models, as in iii), enabling in some cases a Wald-type test for equality of sets of regres- 
sion parameters. It also appears likely that the jackknifing approach extends to the likelihood- 
ratio chi-square test in such situations involving continuous variables, although a firm proof 
of this conjecture is clearly required before application can be recommended. My point in calling 
attention to replication as a competing strategy for the problems presented in the paper is not 
to imply that it represents a methodologically superior approach to the methods of Rao and 
Scott (1984); instead, the availability of this methodology provides an additional choice to solve 
these and similar problems of inference. For example, the focus on replication for the estima- 
tion of variances from the current demographic surveys at the U.S. Census Bureau provides 
the potential to carry out analyses such as those presented in the paper. 

L also want to point out that the methods presented and the analogues from replication theory 
have a potential importance beyond the realm of design-based inference from complex sample 
surveys, which is the focus of the paper. One of these involves the use of multiple imputation 
or related approaches intended to represent the uncertainty due to missing data. The implied 
interpretation of variance within the domain of design-based inference can be extended to 
include uncertainty from missing data without requiring changes to the methodology presented 
in the paper. The general methodology may also be applicable to some problems of inference 
from complex designed experiments, in which the design poses problems of clustering or 
stratification similar to complex sample surveys. 

Of the four models discussed, however, I suggest that the Box and Cox transformation not be 
applied without consideration of alternative strategies, such as transformation of the x-variables 
instead. My own inclination would be to favor an analysis on a logistic scale, with possibly 
transformed predictors, unless the adaptation of the Box and Cox transformation obtains some 
distinct advantage, such as offering an additive model on the transformed scale in an instance 
where the logistic model does not provide as successful a fit without interaction terms. 

I am delighted to have the opportunity to commend the authors on a useful and instructive 


paper. 


! Robert E. Fay, U.S. Bureau of the Census, Washington, D.C. 20233. 
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COMMENT 


C.J. SKINNER! 


This paper provides an excellent discussion of a variety of applications of weighted least 
squares (WLS) and pseudo maximum likelihood (PML) procedures to categorical data. Its clear 
presentation and use of real survey examples will, I hope, help to encourage survey analysts 
to take account of complex designs in their analyses. As the authors indicate, analytical 
statistical procedures which take account of complex designs have been developed extensively 
in recent years (see e.g. Skinner, Holt and Smith 1989) and are even beginning to be referred 
to in standard computer software (e.g. SAS 1985, pp 61-67). 

Commenting first on some specific aspects of the paper, I found Section 5 on polytomous 
variables to be especially valuable, given the wide occurrence of such data in surveys. A prop- 
erty of ordinal variables is that they may often be expected to possess monotonic relationships 
and so, for example, lack of monotonicity between the fitted values of C, (ik) (Or Cy(;~)) and 
the education variable k in Table 3 makes the result of the corrected tests, that there is no 
evidence of an education effect, more plausible than the result of the uncorrected test. 

The discussion of testing equality of two logistic regression models in Section 4 also seemed 
to me to be practically useful, although it would still seem to be possible theoretically to for- 
mulate this test as one of a nested hypothesis within the framework of Roberts, Rao and Kumar 
(1987). 

Section 3 provides a useful illustration of how PML may be applied to general parametric 
models for categorical data. It is, however, gratifying that the more complex transformation 
model provides no significant improvement in fit over the logistic regression model, since the 
interpretation of the parameters of the transformation model is more difficult. For example, 
for the logistic model the coefficient for education may be interpreted as implying that the odds 
of being employed are increased by 16% for each additional year of education for males of 
a given age (exp (.1509) = 1.16), whereas this interpretation is not generally available for the 
transformation model when d = 0. 

On a more general note I would be interested in the authors’ views on the relative merits 
of WLS and PML. In the paper, these methods are presented quite separately, although both 
procedures would seem to be potentially applicable to a very wide class of models for categorical 
data under complex designs. Indeed both procedures are also applicable to models with con- 
tinuous variables (Skinner, Holt and Smith 1989, Chapter 3); WLS requires just a statistic 
consistent for a known function of the parameters together with a consistent estimate of the 
covariance matrix of the statistic (Fuller 1984, Corollary 2), whereas PML is applicable very 
widely as described in Binder (1983). As a basis for discussion I list below a number of criteria 
on which WLS and PML might be compared; M1-M3 are relevant even under multinominal 
sampling, C1-C3 are specific to complex designs. 


M1 Flexibility WLS may be more adaptable than PML for complex problems e.g. involving 
structural zeros. 


M2 Computation WLS computation tends to have a more standard form. 
M3 _ Small cell counts WLS is more sensitive to small counts, especially zeros. 


Cl Adaptability of multinomial methods to complex designs WLS seems more easily 
adaptable. 


! C.J. Skinner, University of Southampton, United Kingdom. 
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C2. Efficiency Under multinominal sampling WLS is usually asymptotically equivalent to 
PML (which is then just standard ML). It might be conjectured that WLS will always 
be at least as efficient as PML under complex designs, although this presupposes a 1-1 
correspondence between WLS and PML estimation problems. If WLS is more efficient, 
is the gain usually negligible (cf. Scott and Holt 1982)? Are there general results here? 


C3 Degrees of freedom WLS estimators and associated Wald tests may be unstable if the 
degrees of freedom used to estimate V, are low. 
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COMMENT 


E.A. MOLINA! 


I would like to congratulate the authors on bringing together some recent methods devel- 
oped for analyzing categorical data arising from sample surveys. The paper should be extremely 
useful for survey analysts who wish to take into account the impact of survey designs on the 
practical aspects of the analysis of survey data. In particular, it is important to emphasize that 
the methods discussed cover two different situations arising in practice: so called primary 
analyses, in which the researcher has all the relevant information at hand, and secondary 
analyses, in which the data provided do not include enough information about the popula- 
tion units to enable the calculation of full covariance matrices of the sample estimators. 

The methods covered require the existence of a structural model for the data. There are situa- 
tions, however, in which it is difficult to specify a single structural model that adequately 
describes categorical data. In large scale surveys there is often need to screen out many cross 
classifications at minimal cost. In such cases the use of measures of association is a common 
alternative. These non parametric methods were extended to sample survey data by Molina 
and Smith (1986, 1988). 

For the primary analysis of survey data the paper concentrates on weighted least squares 
and Wald tests. The results in Scott, Rao and Thomas (1989) are summarized and the rela- 
tionship with quasi-likelihood is mentioned. I think that an important conclusion from that 
paper should be included in this section, namely the need to take into account the survey con- 
straints K’p(XG) = a when using quasi-likelihood methods. The reader may not be aware 
of the importance of the careful choice of the g-inverse in equation (2.9). Quasi-likelihood 
methods are now widely used and the relationship with weighted least squares methods is a 
relevant one. In fact, quasi-likelihood functions represent an interesting alternative for the 
analysis of survey data. However, there are practical problems since the method requires that 
we specify the covariance matrix as a function of p, the variance function. Quasi-likelihoods 
are largely determined by these variance functions (see, e.g., Morris 1982, and Jgrgensen 1987). 
If a matrix of estimates is given instead of a function, the method would be equivalent to the 
use of a normal distribution. 

Most of the paper is devoted to methods involving pseudo likelihoods. Since secondary 
analyses constitute the most common situation in practice, the methods presented are likely 
to be extensively used by survey analysts. I would like, however, to discuss some alternatives. 

The study of the impact of survey design on Guerrero and Johnson’s (1982) transforma- 
tion models is an important addition to the literature. However, Nelder and Pregibon (1987) 
have proposed a family of functions, the extended quasi-likelihoods, that avoid some impor- 
tant disadvantages of transformation models and can be fitted with GLIM. If design effects 
are available, their methods can be adapted to survey data by incorporating them either in the 
variance functions or in the form of weights. Alternatively, design variables may be used to 
adjust the dispersion parameter in the models. In both cases, one advantage is that we can use 
the goodness of fit statistics and standard errors produced by GLIM under these models to 
examine the data without the introduction of further corrections. 

These comments apply in general to the use of pseudo-likelihoods. The effect of ignoring 
the survey design may be treated as an increase or decrease in the expected variability that may 
be modelled as overdispersion or underdispersion by means of quasi-likelihoods or extended 
quasi-likelihoods. See, e.g., Pocock et al. (1981), Breslow (1984), Williams (1982), among 


! B.A. Molina, Universidad Simon Bolivar, Caracas and University of Southampton, United Kingdom. 
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others. As an example, I reanalyzed the data in Table 1. The analysis given in the paper is the 
correct one, since it incorporates the true covariance matrix. Suppose, however, that this matrix 
is not available and that only the cell design effects are at hand. Using GLIM I fitted model 
(2.12) with a Poisson error ignoring the sampling scheme. This gives X* = 5.68, G* = 5.67. 
The Rao and Scott (1987) approximation for the chi square statistic gives X 2(5) = 5.68/ 
2.25 = 2.52. For the independence model the uncorrected values are X? = 18.22, G* = 
18.22, and the correction gives X7(5) = 18.22/1.65 = 11.04. What can be done if the deffs 
are not available?. A simple quasi-likelihood approach to overdispersion is to estimate the mean 
deviance for the larger model, D = 5.68/3 = 1.89, and to use the inverse of this value as a 
weight (or as a new scale parameter). This give X* = 3.01 for model (2.12) and X 2 = 9.65 
for the independence model. The correct approach here is to use the excess in deviance (the 
difference between the log-likelihood ratio statistics) to test y = 0, since G? will equate the 
degrees of freedom for the larger model. The value is 6.65, which is just significant at the 1% 
level. Both analyses are in agreement with the correct analysis given in the paper, but in other 
situations it may not be so. The quasi-likelihood model presented here is equivalent to assuming 
that the actual covariance matrix is a multiple of the one obtained under multinomial sampling, 
a model that may perform badly in several situations. The advantage is that it can be used when 
the only information available is that given by the variability inherent in the data, and the 
analysis performed in a standard statistical package like GLIM. If the deffs are available, other 
models involving them may be proposed, and a paper is in preparation. 

There is, however, no completely satisfactory substitute for an analysis involving the actual 
covariance matrix. The objective of this contribution is to highlight other possibilities when 
the full covariance matrix is not known. Quasi-likelihoods offer a fertile ground for further 
exploration, particularly in relation to survey data. The paper under discussion presents several 
alternatives and is an important contribution to the field. 
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RESPONSE FROM THE AUTHORS 


We thank the three discussants, Fay, Molina and Skinner, for their useful comments and 
for suggesting additional methods useful in the analysis of cross-classified data from complex 
sample surveys. 


(i) Response to comments of R.E. Fay 


We agree with Fay that replication methodology and associated jackknife chi-squared 
tests provide viable alternatives to the methods presented here, provided the survey design 
permits the use of a replication method such as the jackknife or the balanced half-sample repli- 
cation. His CPLX program indeed offers a comprehensive analysis option whenever estimates 
are available at the individual replicate level. Also, as noted in the Introduction, Fay’s jack- 
knife tests and Rao-Scott corrections have performed well under quite general conditions in 
simulation studies, unlike the Wald tests based on weighted least squares. Rao-Scott correc- 
tions are, however, also applicable to survey designs not permitting the use of a replication 
method. 

The software systems for the Canada Health Survey and the Canadian Labour Force Survey 
were set up to readily provide the estimated covariance matrix of cell estimates but not the 
replicate level estimates. As a result, the implementation of jackknife tests would have required 
some changes in the software systems. 

We are also thankful to Fay for pointing out that the methods presented here, and the 
analogues from replication theory, can also handle some problems of inference from complex 
designed experiments involving clustering and stratification. Indeed, one of us (J.N.K. Rao) 
recently used Rao-Scott type methods to fit dose-response models and to test hypotheses in 
teratological studies involving animal litters as experimental units (Rao and Colin 1989). These 
methods do not assume specific models for the intra-litter correlations, unlike other methods 
proposed in this area. 

We considered Box-Cox transformation models since Guerrero and Johnson (1982) obtained 
significantly better fits on some Mexican data compared to the logit model. We agree with Fay, 
however, that the Box-Cox models should not be applied without consideration of alternative 
strategies, such as transforming the predictors. As noted by Fay, the Box-Cox approach would 
be useful in these cases where it would lead to additive models on the transformed scale while 
the logit model would require interaction terms. 


(ii) Response to comments of E.A. Molina 


Molina is correct in saying that measures of association can be used to screen out many cross 
classifications at minimal cost. His joint work with T.M.F. Smith on extending the classical 
theory for measures of association to sample survey data involving clustering and stratifica- 
tion is an important contribution. 

As noted in the Introduction, we assumed throughout the paper that the user has access 
to a full estimated covariance matrix of cell estimates. However, such detailed information 
is often not available for secondary analyses, and in fact even cell deffs may not be available, 
as pointed out by Molina. In the latter case, Rao and Scott (1987) showed that an F statistic 
used in GLIM for testing a nested hypothesis, such as y = 0 given the model (2.12), is asymp- 
totically valid whenever the covariance matrix of cell estimates, V, is proportional to the 
multinominal covariance matrix, P. The F-test, however, is less powerful than the Rao-Scott 
tests, unless the denominator degrees of freedom are high. In the latter case, the F test might 
work well even if the condition V « P is not satisfied (see Rao and Scott 1987, p. 392). 


186 Rao, Kumar and Roberts: Analysis of Categorical Survey Data 


For the data in Table 1, F = 6.63 for testing y = 0 given the model (2.12), which is not 
significant at the 5% level compared to F; 3(0.05) 10.01, the upper 1% of the F distribu- 
tion with 1 and 3 degrees of freedom (d.f.). On the other hand, the Wald test W, and the 
Rao-Scott test, both requiring detailed information on the estimated covariance matrix, are 
significant at the 1% level compared to x7(0.01) = 6.63. The F-test, therefore, appears to 
be less powerful here since the denominator d.f. is only 3. Molina’s proposed test is, in fact, 
equal to F, but he was treating Fas a x’ variable with 1 d.f. which may not be valid due to 
small denominator d.f. 

The GLIM method does not provide a statistic for testing the goodness-of-fit of a model. 
Some information on the design effects is necessary for getting a valid test of goodness-of-fit. 


(iii) Response to comments of C.J. Skinner 


Skinner noted that the test of equality of two logistic regression models in Section 4 might 
be formulated as a test of a nested hypothesis within the framework of Roberts, Rao and 
Kumar (1987), using dummy x-variables. The framework of Roberts, Rao and Kumar, how- 
ever, assumes one fixed sample size n whereas in Section 4 we have two fixed sample sizes n, 
and ny for the two time periods. As a result, their results would need careful modification in 
order to be applicable to the present case of test of equality of two logistic regression models. 
Moreover, the dummy variable approach would involve the determination of estimates of 2s 
parameters iteratively, whereas the approach in Section 4 requires two iterative solutions, each 
involving only s parameters. Thus, the dummy variable approach could lead to convergence 
problems if s is not small. 

We treated WLS with singular covariance matrices separately in Section 2 since the logit- 
type models in the remaining sections do not involve singular covariance matrices. WLS can 
also be applied to logit-type models but the resulting estimators and associated Wald tests 
may be unstable if the degrees of freedom associated with the estimated covariance matrix, 
Vp, are low (criterion C3 of Skinner). The six criteria proposed by Skinner for comparing 
WLS and PML are very useful. We prefer PML mainly on the basis of criterion C3. Regarding 
the relative efficiency of WLS and PML estimators under complex designs, no general results 
are available, but WLS estimators are not likely to be significantly more efficient (and in 
fact, may be less efficient) if the degrees of freedom associated with the estimated covariance 
matrix are low. Clearly, further research on the relative efficiency of WLS and PML estimators 
would be useful. 
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Simultaneous Confidence Intervals for Proportions 
Under Cluster Sampling 


D. ROLAND THOMAS! 


ABSTRACT 


The paper describes a Monte Carlo study of simultaneous confidence interval procedures for k > 2 
proportions, under a model of two-stage cluster sampling. The procedures investigated include: (i) stan- 
dard multinomial intervals; (ii) Scheffé intervals based on sample estimates of the variances of cell 
proportions; (iii) Quesenberry-Hurst intervals adapted for clustered data using Rao and Scott’s first 
and second order adjustments to X ox (iv) simple Bonferroni intervals; (v) Bonferroni intervals based 
on transformations of the estimated proportions; (vi) Bonferroni intervals computed using the critical 
points of Student’s ¢. In several realistic situations, actual coverage rates of the multinomial procedures 
were found to be seriously depressed compared to the nominal rate. The best performing intervals, 
from the point of view of coverage rates and coverage symmetry (an extension of an idea due to 
Jennings), were the ¢-based Bonferroni intervals derived using log and logit transformations. Of the 
Scheffé-like procedures, the best performance was provided by Quesenberry-Hurst intervals in com- 
bination with first-order Rao-Scott adjustments. 


KEY WORDS: Simultaneous inference; Complex surveys; Monte Carlo. 


1. INTRODUCTION 


Survey results are often presented as estimated proportions (or percentages) of popula- 
tion units belonging to two or more distinct categories. Examples include many sociological 
studies (see for example Black and Myles 1986), marketing studies and opinion polls. As 
noted by Fitzpatrick and Scott (1987), inference on category proportions is often based on 
single binomial confidence intervals, even when more than two category proportions are being 
examined. This paper describes a study of several procedures for constructing simultaneous 
confidence intervals for the proportions z;,i = 1, ..., k, of population units belonging to 
each of k distinct categories, using data from a two-stage cluster sample. Standard 
simultaneous confidence interval procedures for categorical data problems, reviewed by 
Hochberg and Tamane (1987), are based on the assumption of multinomially distributed 
sample counts, and are thus appropriate for data from simple random samples. When the 
data have been collected using sample survey designs that involve clustering, standard pro- 
cedures are likely to perform poorly, as is the case when standard multinomial based tests 
are applied to data from complex sample surveys. In the latter case, it has been shown by 
many workers that clustering can lead to unacceptably high Type I error rates (see, for 
example, Fellegi 1980; Rao and Scott 1979, 1981; Holt, Scott and Ewing 1980). For 
simultaneous confidence intervals, therefore, it is natural to expect that clustering will lead 
to coverage probabilities that are lower than multinomial theory indicates. 

Estimation of simultaneous confidence intervals (SCI’s) is an important adjunct to 
hypothesis testing. The present study thus represents a natural follow-up to Thomas and 
Rao’s (1987) investigation of test statistics for the simple goodness of fit problem, under 
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simulated cluster sampling. In this paper, adaptations of the standard SCI procedures are 
proposed, and their performance in small samples is evaluated using Monte Carlo techniques. 

The cluster sampling model that is used in the Monte Carlo study is described in Section 
2, and the SCI procedures to be examined are presented in Section 3. In Section 4, the design 
of the Monte Carlo experiment is described, together with procedures for evaluating confidence 
interval performance. The main results of the study are presented in Sections 5 through 7, 
followed in Section 8 by some final conclusions and recommendations. 


2. THE CLUSTER SAMPLING MODEL 


This investigation will focus on two-stage sampling in which a k-category sample of m units 
is drawn independently from each of r sampled clusters. 

For a sample of sizen = mr,letm = (mj, ..., M,_1)’ represent the category counts for 
the whole sample, where m, = n — Y*=]| mj. In terms of proportions, let * = (7), ..., 
it,_-,) = m/n be the vector of category proportions for the full sample. Further, define 
ax = E(%), where E denotes expectation under a suitable model of cluster sampling, and let 
V/nrepresent the (k — 1) x (kK — 1) covariance matrix of 7. Following Rao and Scott (1981), 
the ordinary design effect for the linear combination c’ x of category proportions is c’ Ve/c’ Pe, 
where Pis n times the covariance matrix of # under multinomial sampling, i.e.,P = diag() 
— an’, and cisa vector of dimension k — 1. The largest design effect taken over all possible 
linear combinations is given by the largest eigenvalue of the design effect matrix D = PP 
The eigenvalues of D, denoted in decreasing order by )j, Ag, .--, Ax—1, Were termed 
generalized design effects by Rao and Scott (1981), and provide a quantitative summary of 
the variance inflation associated with a particular design, relative to simple random sampling. 
Under the multinomial distribution, corresponding to simple random sampling from large 
populations, \; = 1 vj. Designs involving clustering usually yield generalized design effects 
greater than one on the average, /.e., esas 1m SAA Sa 9 Ee Furthermore, studies of 
real survey data (Hidiroglou and Rao 1987; Rao and Thomas 1988) reveal significant variation 
among the A,’s. This is conveniently represented by their coefficient of variation, given by 


k-1 
a=(¥ N2/[(k — 1) 7] =) (1) 


g21 


A suitable model of cluster sampling must therefore be capable of generating generalized design 
effects such that \ > l anda > 0. 

Brier (1981) proposed a model of two-stage cluster sampling in which individual clusters 
are represented by vectors of category probabilities, pp = (Dy, Pr, ---» Pe, k-1) of = 1, «+35 
r, where for each cluster, py, = 1 — 2} p,;. Each p, was independently drawn from a 
Dirichlet distribution with mean 7, i.e. E(py) = a, and second stage sampling of the m units 
per cluster was multinomial, conditional on the realized value of p, for that cluster. Let the 
vector of counts for each cluster be m, = (mq, ... Mpx—1), With M, = mM — huni 
Thus for the full sample, m = ¥ }-, m,, and in terms of proportions, # = Y j=; 7, where 
itp = m,/m. Brier (1981) showed that under this model, E(#) = mand V(%) = dP/n,i.e., 
the covariance matrix of # is proportional to the multinomial covariance matrix, with the 
constant of proportionality d > 1. Under this model, the design effect matrix is given by 
D = dl;,_,, where I,_, is the identity matrix of order k — 1. Thus \; = dVi, sothat \ = d 
and a = O. Brier’s model can therefore represent variance inflation (X > 1), but cannot 


Survey Methodology, December 1989 189 


represent the unequal generalized design effects encountered in practice. Thomas and Rao 
(1987) used an extension of Brier’s model in which the first stage p,’s are sampled 
independently from a mixture of two Dirichlet distributions, representing a population com- 
posed of two distinct classes of clusters. This model, which is a special case of that proposed 
by Rao and Scott (1979), generates one distinct and k — 2 equal eigenvalues, with \ and a 
being explicit functions of the Dirichlet parameters. This greatly facilitates the design of the 
Monte Carlo study by allowing for convenient control of the values of the clustering measures 
\ and a. Since it satisfies the basic requirements outlined above (X\ > 1,a > 0), Thomas and 
Rao’s (1987) model will be used in this study. 


3. SIMULTANEOUS CONFIDENCE INTERVAL PROCEDURES 


3.1 Scheffé Intervals 


A standard Scheffé argument, based on the asymptotically exact probability statement 
P(né —x)'V'(%#-x)s xi) =l—a (2) 


leads to simultaneous confidence intervals for linear combinations, ¢’ , of the category pro- 
babilities, where fis a vector of dimension (k — 1). Appropriate choices of then yield SCI’s 
on the individual cell probabilities given by 


1 € G + (6,4) 1”? canna} i iid baatone cd (3) 


where A = x%_, (a) is the upper a percent point of a chi-squared distribution on k — 1 
degrees of freedom, and %;; is the i‘" diagonal element of a consistent estimator of V (as 
r — ©) given by 


. n z 
V= EP isel eae) A = A A te A La. 4 
Tory » (ate — i) (ate — *) (4) 


Note that when the endpoint of an interval lies outside [0, 1], definition (3) must be modified 
by truncating the endpoint to 0 or 1 as appropriate. For multinomial sampling, #, can be 
replaced by 7; (1 — 7;), in which case the Scheffé intervals reduce to those proposed by Gold 
(1963). The latter will be referred to as Scheffé-Gold intervals. The Scheffé intervals of equation 
(3) will be conservative, i.e., will have coverage exceeding (1 — a) asymptotically since they 
make use of only a finite number of the available / directions (see Miller 1981, page 63). In fact, 
they will become very conservative as k increases, as can be shown using the following argu- 
ment due to Goodman (1965). The coverage of the Scheffé intervals is equal to one minus the 
probability of occurrence of at least one of the events { (%; — 1;)7/(¥,/n) > X7k-1) (a) }, 
i = 1, ..., k; since the random variables (7; — 1;)7/(%,/n) each have chi-squared distribu- 
tions on one degree of freedom asymptotically, the probability of each individual event can 
be evaluated. Using the Bonferroni inequality, lower bounds for the coverage can then be 
obtained; for a nominal coverage of 95% with k = 3, 5, 8 and 12, these bounds are .9571, 
.9896, .9986 and .9999 respectively. 
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3.2 Modified Quesenberry-Hurst Intervals 


Under the assumption of multinominal sampling, Quesenberry and Hurst (1964) solved the 
large sample probability statement 


> (tj — 4)? 
Sy lao agama pit =l-a (5) 


for the cell probabilities 7;, to get the SCI’s 


A 1xpo ape 1/2 
eae (* 4+ A/2n + (A/n)”? (a; (1 — %)) + A/4n] a 


1 + A/2n 


Under multinomial sampling, these intervals are asymptotically equivalent to Scheffe and 
Scheffé-Gold intervals, and will therefore exhibit similar asymptotic conservativeness. 

Quesenberry-Hurst (Q-H) intervals can be adapted for use with clustered survey data using 
the first and second order corrections to the distribution of X* proposed by Rao and Scott 
(1981). Corresponding first and second order SCI’s are obtained by replacing A in equation 
(3) by 


A® = A and A® = C1 + 4) x? (a) (7) 


respectively, where v = (kK — 1)/(1 + G*) and X, an estimate of the mean of the generalized 
design effects, is given by (Rao and Scott, 1981) 


i k 
N= (kK-1)% YY (i — ) a, (8) 
i=l 
where d;,i = ..., kis an estimated cell design effect given by d; = Vy/%; (1 — %;). The coef- 
ficient of variation, a, is estimated by replacing in equation (1) by 4, and Y ? by the 
estimate ) i? =e We di / i; 7;. It turns out (see Thomas 1989) that the second order modified 
intervals are unnecessarily conservative, so that only the first-order modified Q-H intervals 
will be discussed in the remainder of the paper. 


3.3 Simple Bonferroni Intervals 


Since (loosely speaking) each 7; is asymptotically N(1;, vj/n), the intervals 
ss S G ot (¥;/n) pis za ’ (9) 


will have large sample coverage at least (1 — a) by the Bonferroni inequality, wherea’ = a/ k 
and Z,//2 is the upper a’ /2 percent point of the standard normal distribution. Intervals (9) are 
equivalent to Scheffé intervals with A in equation (3) replaced by A® = xj(a’). As noted 
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by Goodman (1965), they will be shorter than Scheffé intervals for the usual values of a and 
k;e.g.,a@ = 1%, 5%, or 10%. Goodman’s (1965) multinomial Bonferroni intervals are given 
by equation (9) with 9; replaced by 7; (1 — 7;). All endpoints of simple Bonferroni intervals 
that lie outside [0, 1] will be truncated to 0 or 1 as appropriate. 


3.4 Transformed Bonferroni Intervals 


For suitably smooth g, g(7;) will be asymptotically N(g(7,), | 2/( m;)| v;;/n), where g;(7;) 
denotes the partial derivative 0g(7;)/d7; evaluated at z;. Bonferroni intervals can then be 
obtained by inverting corresponding intervals on the g(7;)’s, giving 


™ € feet + gj (7;) (Buln) "? zara) (10) 


Three g functions will be investigated: the square root g,(7;) = 1}/? (previously investigated 
by Bailey 1980, for the case of multinomial sampling); the natural logarithm g,(2;) = /n(7;); 
and the logit g3(7;) = In(a;/(1 — 7;)). Interval endpoints that lie outside [0, 1] will again 
be truncated to 0 or 1 as necessary. 

Transformed Bonferroni intervals based on a jackknifed estimator of the variance of g(7r) 
have also been examined (see Thomas 1989). It was found that there is little advantage to using 
jackknifed variance estimates; Taylor series variance estimates are therefore recommended for 
their simplicity. Intervals based on jackknife variance estimates will not be considered further 
in this paper. 


3.5 Variants of the Above Intervals 


Scheffé Intervals: Following Thomas and Rao (1987), Scheffé intervals can be modified by 
replacing the critical constant A in equation (3) by A“ = (kK — 1) (r— 1) (r—k 4+ 1)7! 
F (x~1), (r—-k+1) (@), where F(x_1), (--x+1) (a) is the upper a percent point of an F distribu- 
tion on (kK — 1) and (r — k + 1) degrees of freedom. 

Quesenberry-Hurst Intervals: Variants of the modified Quesenberry-Hurst (Q-H) intervals 
can also be defined, corresponding to the F forms of the first and second order corrected test 
statistic proposed by Thomas and Rao (1987). These again turn out to be conservative, and 
will not be considered further. 

Bonferroni Intervals: Heuristic arguments (see the appendix to Thomas and Rao 1987) 
suggest that the simple Bonferroni intervals can be improved by replacing z,-/2 in (9) by 
t-_,(a@’/2), the upper a’ /2 percentage point of Student’s ¢ distribution on r — 1 degrees of 
freedom. This strategy will also be applied to the transformed Bonferroni intervals. 


4. THE DESIGN OF THE MONTE CARLO STUDY 


4.1 Parameters and Random Numbers 


The parameters to be controlled are: (i) the nominal coverage level (1 — a) of the SCI; 
(ii) x, the model probability vector; (iii) k, the number of categories; (iv) 7, the number of sample 
clusters; (v) m, the number of units drawn from each sampled cluster; (vi) \, the mean of the 
generalized design effects (eigenvalues); (vii) a, the coefficient of variation of the generalized 
design effects. The nature and degree of clustering is represented by the pair (\, a) as follows: 
(a) multinomial sampling (\ = 1, a = 0); (b) constant design effect clustering (X > 1, 
a = 0); (c) non-constant design effect clustering (X > 1, a > 0). 
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Individual Monte Carlo experiments were run for particular combinations of k, \, a and 
Tmax» the latter being the maximum number of clusters generated in one computer run. Most 
experiments were run at two values of \, namely 1.5 and 2.0, two values of a, namely a = 0 
(constant design effects) anda > 0 (one level of non-constant design effects), for equiprobable 
categories (a; = 1/k,i = 1, ..., k). Three values of k (k = 3,5, 8) were initially selected 
to cover the range of numbers of categories commonly encountered in goodness-of-fit tests. 
An additional run was subsequently done for the casek = 12, \ = 2anda > Otocheck on 
the range of applicability of the results. The number of units per cluster was set at m = 10 
fork = 3,5and8,andatm = 20fork = 12. Preliminary investigations showed coverage 
rates to be insensitive to the value of this parameter. For comparability of results over k, the 
non-zero settings of a were selected to make @/d,q, the same for each selected value of k, 
where @ max = (kK — 2)? is the maximum possible value of a. For k = 5, the non-zero value 
of a was set at 0.5, which is typical of the values encountered in practice, e.g., @ = 0.43 for 
k = 5, as reported by Rao and Thomas (1988). 

The initial focus on equiprobable categories allowed for a cost effective assessment of the 
influence of k, \ and a on coverage rates, and eliminated many of the possible SCI variants 
from further consideration. Additional experiments reported in Section 7 show that the 
procedures that passed this initial screening can in fact be applied when the cell probabilities 
are markedly unequal. Vectors of unequal probabilities were confined to the class 
a(k, q, o), defined by the elements 7; = ¢,i = 1, ...,qand a; = (1 — q@)/(k — @), 
bode akomsaet Kz 

For details of the generation of the random clusters from the mixture Dirichlet multinomial 
distribution, the reader is referred to Thomas and Rao (1987). Each Monte Carlo experiment 
consisted of 1000 sets of up to 100 independent clusters, grouped into nested subsets. All SCI 
procedures were applied in turn to each subset, using two nominal coverage levels (95% and 
90%), thus improving the precision of comparisons between procedures at the same param- 
eter settings, and between the same SCI procedures for different numbers of clusters. Most 
of the results presented will be for 95% nominal coverage; trends for 90% coverage were found 
to be qualitatively similar. 


4.2 Evaluation Procedures 


The percentage of Monte Carlo trials for which at least one of the k confidence intervals 
fails to cover the true parameter value is reported, and used for a preliminary screening of the 
main SCI procedures. This is a measure of the family error rate, which is equivalent to the 
actual significance level of the SCI when the latter is viewed as a test of goodness-of-fit. The 
family error rate, which will be referred to in this paper as the total error rate ER7, is used 
in place of the more commonly reported actual coverage rate (equal to one hundred percent 
minus the total error rate) because it can be conveniently split into two one-sided rates which 
will provide information on the symmetry or ‘unbiasedness’ of each SCI procedure. Jennings 
(1987) argued that coverage rates alone can provide a misleading assessment of single param- 
eter confidence interval procedures, and recommended that the number of times that an interval 
falls above and below the true parameter value should be separately reported. In this paper, 
Jennings’ suggestion has been adapted to simultaneous confidence intervals on 7;, i © J, where 
T is the index set {1, ..., k}, by counting the number of Monte Carlo trials for which: 


(a) more intervals fall above their corresponding 7z;, i € J, than fall below; 
(b) more intervals fall below their corresponding z;, i € J, than fall above; 
(c) the same number (> 0) of intervals fall above their corresponding 7;, i € J, as fall below. 
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Upper and lower error rates are then defined as ERy = [n, + (n,/2)]/N,and ER; = 
[n, + (n,/2)]/N;,, respectively, where N, represents the number of Monte Carlo trials, and 
Ng, Np» and n, denote the counts (a) through (c), respectively. The sum of ERy and ER, is 
clearly equal to the total error rate, ER. These one-sided error rates will be used to compare 
SCI procedures whose overall error rates are acceptably close to the nominal rate a, over a 
range of parameter settings and cluster strengths. Average interval lengths and corresponding 
standard errors have also been computed, and will be used as final discriminators in the selec- 
tion of the recommended procedures. 


5. A SUMMARY OF RESULTS FOR TOTAL ERROR RATES 


All results in this section are given in terms of the total error rate ER7, defined in Sec- 
tion 4. For lack of space, tables are presented only for the case of unequal design effects, 
(a > 0), with \ = 2. More detailed results are given in Thomas (1989). In interpreting the 
tabulated results, it should be noted that for 1000 Monte Carlo trials, binomial standard 
errors of point estimates of true ER7’s having magnitudes 5%, 10% and 20% are 0.7%, 
0.9% and 1.3% respectively. As a general rule deviations from nominal rates, and differences 
between the error rates of different SCI procedures will be noted only when they are large 
enough to have practical significance, and exceed their Monte Carlo standard errors by a 
factor of at least two. 


5.1 Miultinomial Procedures 


Results for multinomial intervals will only be summarized here; for details see Thomas 
(1989). Under cluster sampling, error rates for Goodman’s Bonferroni intervals (see equation 
(9) with #,; replaced by #,(1 — 7)) are unacceptably high except for values of \ close to 1, i.e., 
unless the effect of clustering is small. The Scheffé-Gold and multinomial Quesenberry-Hurst 
intervals, on the other hand, can yield error rates that are close to the nominal value in certain 
cases, whenever their inherent conservativeness balances the error inflating effects of clustering 
(see also Andrews and Birdsall 1988). Unfortunately, this is not always the case; both procedures 
can display inflated error rates (ER; = 2a) for realistic combinations of category numbers 
and clustering strengths. 

Multinominal procedures should therefore not be used with complex survey data. Procedures 
are clearly required that directly account for the clustering, and provide good coverage for the 
required number of categories, over a wide range of clustering conditions. 


5.2 The Scheffé Procedures 


Total error rates for the x?-based Scheffé procedure of equation (3) and its F-based variant 
are summarized in Table 1 as functions of 7, for the caseaw = 5%, \ = 2anda > 0. More 
detailed graphs are given in Thomas (1989). 

For the values of k studied, ER for the x7-based Scheffé procedure of equation (3) increases 
rapidly as the number of clusters decreases, so that it should never be used for small numbers 
of clusters. The F-based variant, on the other hand, keeps ER; reasonably close to or below 
a = 5% for all r. As r increases, ER; for F-based Scheffé remains fairly constant for the 
case k = 3, but becomes increasingly conservative for k => 5, as does the x” version. These 
empirical trends with varying r can be explained in terms of two competing effects. As r 
increases, error rates for both procedures approach their asymptotic levels which are bounded 
above by 4.29%, 1.04% and 0.14%, for k = 3, 5 and 8 respectively (see Section 3.1). 
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Table 1 
Total Error Rates for Scheffé and Modified Q-H Intervals; 
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Total Error Rate (ER-) 
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As r decreases, however, the conservativeness of the Scheffé procedures (for k = 5) will be 
increasingly swamped by the effects of increasing non-normality of the estimated proportions, 
x. For the F-based version, the inflation in error rate due to non-normality is less than for the 
chi-squared version of equation (3), with the result that ER; for the F-based version never 
seriously exceeds the nominal 5% rate. For moderate levels of clustering (\ = 1.5), the 
behaviour of the F-based procedure is qualitatively similar to that described above for the case 
\ = 2. From the point of view of total error rate, therefore, the F-based Scheffé procedure 
is useable over a wide range of clustering situations, though its possible conservativeness is 
a disadvantage. 


5.3. Modified Quesenberry-Hurst Intervals 


Total error rates for the first order modified Quesenberry-Hurst (Q-H) procedure of Sec- 
tion 3.2 are also shown in Table 1 fora = 5%, \ = 2anda > 0. 

Total error rates are close to or below the nominal 5% for all combinations of rand k shown. 
For moderate to large numbers of clusters (r = 30), error rates for k = 5, and 8 are very 
similar, being approximately one half of the nominal rate (true also when k = 12). For the 
case of constant design effects (see Thomas 1989), error rates for first order modified Q-H 
intervals are conservative fork = 5, particularly for large r. The absence of this Scheffé-like 
conservativeness for the more realistic case of unequal design effects shown in Table | can again 
be explained using the argument of Section 3.1. From equation (6), it is easily seen that the 
asymptotic coverage of the first-order modified Q-H intervals is given by one minus the pro- 
bability that at least one of the random variables (#; — 1;)7/(Aa;(1 — 7;)/n),i = 1, ..., 
k, will exceed the critical value x%_,(a@) asymptotically. When a > 0, these individual 
random variables will not all be asymptotically distributed as chi-squared on one degree of 
freedom, so that the bound of Section 3.1 does not apply. The true bound on the error rate 
will be inflated since at least one of the random variables (7; — 7;) a4 Aa;(1 — 2;)/n) will 
be stochastically larger than (4; — 7;)7/(vj#/n), whenever a > 0. 
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Trends for the case \ = 1.5 are similar (Thomas 1989). Overall, the results show that from 
the point of view of total error rates, first-order modified Q-H intervals provide a safe but 
somewhat conservative SCI procedure under realistic clustering conditions. 


5.4 Simple Bonferroni Intervals 


Total error rates for the simple Bonferroni intervals given by equation (9) are summarized 
in Table 2 for the casea = 5%, \ = 2,a > 0,andk = 3,5and8. Also shown are correspon- 
ding error rates for the ¢-based variants described in Section 3.5. 

From Table 2, it is evident that the error performance of both sets of SCI’s is poor, both 
showing a strong tendency to high error rates for small to medium numbers of clusters when 
k, the number of categories, is five or more. Using critical values of Student’s ¢ distribution 
to compensate for the variability in the estimated variances of the category proportions clearly 
has the effect of generally lowering error rates. As can be seen from Table 2, however, this 
strategy is unable to prevent significant error rate inflation in the ¢-based intervals as the number 
of clusters decreases, except when k = 3. The trend to inflated error rates for small numbers 
of clusters (for both z and ¢-based intervals), is due to the increasing non-normality of the 7;’s 
with decreasing r. This trend gets progressively more severe as k increases, which is to be 
expected since non-normality will become more pronounced, for a given value of 7, as the values 
of the z;’s get smaller. This is precisely what happens with increasing k in the case under study, 
for which 7; = 1/k Vi. 

When k = 3, error rates for the ¢-based procedure are essentially constant, and close to 
the nominal level. For k = 8, on the other hand, ER, varies from close to 20% at r = 15 to 
approximately 8% atr = 100. From Table 2, and other results not shown, it appears that for 
k = 8, simple t-based intervals approach their Bonferroni limits very slowly as r — o. Also, 
for k <5, error rates are close to the nominal level for moderate to large numbers of clusters 
(r = 40). Results for constant design effects, and for the case \ = 1.5 are consistent with 
the above. From the point of view of total error rates (or equivalently of coverage rates), it 
is clear that simple ¢-based Bonferroni intervals are useable in practice over a range of realistic 
clustering situations only if k < 5S andr = 40. 


Table 2 


Total Error Rates for z and f-Based Simple Bonferroni Intervals; 
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5.5 Transformed Bonferroni Intervals 


The more detailed results given in Thomas (1989) demonstrate that the problem of error 
rate inflation exhibited by simple z-based Bonferroni intervals is not solved by the use of 
transformations alone. All three transformed z-based intervals again display severely inflated 
error rates for small to medium numbers of clusters. Fortunately, the effect of transforma- 
tions on the ¢-based Bonferroni intervals is very different, as can be seen from the results 
summarized in Table 3. 

For k = 3, 5and 8, error rates for the log and logit intervals are close to the nominal 5% 
for all r values shown, with the logit intervals yielding slightly lower rates than the log intervals 
(see the footnote to Table 3). The ¢-based square root intervals, on the other hand, exhibit the 
undesirable characteristic of error rate inflation for small r, when k = 8; they will not be 
considered further. For large numbers of categories (kK = 12), both log and logit intervals 
do exhibit some error rate inflation for intermediate numbers of clusters (r = 30). This is not 
a serious drawback, however, as this number of categories is rarely encountered in practice. 
Results for constant design effects, and for the case \ = 1.5 are generally similar to those 
described above. 

It thus appears that for the ranges of k, r, \ and a that are likely to be encountered in 
practice, log and logit transformations (which reduce the non-normality in 7) used in com- 
bination with f-based critical values (which compensate for the variability in the estimated 
variances) do yield intervals that provide the desired degree of control. These intervals will be 
explored further in Section 6 in terms of the symmetry of their error rates. 


Table 3 


Total Error Rates! for t-based Transformed Bonferroni Intervals; 


avaeS Yo xk t=12em = 10 tor k= 0,7 20 fork — 12 


Total Error Rate (ER) 


t-based Transformed Bonferroni 


Veo gil ni g Square Root Log Logit 
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|b a) Da 18) 12.9 10.1 10.2 
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1 For k = 8 andr = SO, the correlation between ER; estimates for log and logit intervals is 0.92. 
Assuming this is typical for all r and k, the Monte Carlo standard error of the difference in log and 
logit error rates is approximately 0.3%. 
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Table 4 


Percentage Asymmetry (PER)! in the Total Error Rate for the Viable Procedures; 
a> 0*,r = 50, m = 10 fork < 8,m = 20fork = 12 


PERy = (ER,/ER7) X 100% 


Scheffé Modified Q-H t-based Bonferroni 
a Kee tx (F-based) (first order) (log) (logit) 
Soe Msinels5 19.2 58.7 61.0 48.9 
SOM oid. 42.0 0.0 45.0 61.1 48.8 
oy (haa a I 0.0 63.2 67.5 56.8 
3% O20 0.0 65.2 64.9 49.0 
$%7o1612/2.0 0.0 46.9 53.8 51.6 
POV SHES 16.3 49.4 59.2 48.4 
10% 5 2.0 6.1 50.0 61.8 48.6 
10 Fein BS) el: <5 0.0 60.7 67.3 55.8 
10% 8 2.0 0.0 65.6 60.7 50.0 
10997"12""2-0 0.0 47.5 56.0 51.4 


‘Fork = 8, \ = 2anda = 5%, the correlation between PER y estimates for log and logit intervals 
is 0.82. Assuming this is typical, Monte Carlo standard errors for differences in log and logit PER,’s 
are approximately 4% and 3% for a = 5% and 10%, respectively. 

For values of a for specific k, see Table 3. 


6. ERROR RATE SYMMETRIES FOR THE VIABLE PROCEDURES 


This section presents results on error rate symmetry based on the decomposition of the total 
error rate ER, into its two additive components ERy and ER,, as described in Section 4. The 
measure used in the tables is (ERy/ER7) x 100%, i.e., the upper error rate expressed as a 
percentage of the total error rate. It will be denoted PERy. A symmetric SCI will have an 
empirical PER that is close to 50%; a PER, that is greater (less) than 50% will indicate an 
increased probability of non-coverage due to intervals lying above (below) their respective 7;’s. 
For values of percentage symmetry between 50% and 80%, 95% confidence intervals on the 
true PERy are approximately (PERy +14)% and (PERy + 10)% for total error rates of 
5% and 10% respectively. 


6.1 Modified Scheffé and Quesenberry-Hurst Intervals 


Percentage symmetry results for the F-based Scheffé and the first order Quesenberry-Hurst 
(Q-H) intervals are given in Table 4 for a selection of parameter values. It can be seen that 
the Scheffé procedure displays extreme asymmetry, making it an unattractive SCI. The first 
order modified Q-H procedure displays only moderate asymmetry, and is therefore the better 
of the two in practice. 

The source of the asymmetry in the Scheffé intervals is again the non-normality of the un- 
transformed 7;’s. In particular, the fact that ‘‘small’’ 7;’s generate ‘‘small’’ estimates of the 
variances v;;and hence shorter intervals (cf. the multinomial case where #;, = 7;(1 — 7;)/n, 
i = 1, ...,k) increases the probability that non-covering intervals with lie below their respec- 
tive m;’s. This tendency to asymmetry will increase as the total error rate decreases, making 
the F-based Sheffé procedure particularly vulnerable to this effect. Since Scheffé intervals differ 
from simple Bonferroni intervals only through the critical constant used, asymmetry is also 
to be expected in the latter though it should not be as severe given that error rates for simple 
Bonferroni intervals are liberal. This is confirmed by study results, e.g., PER, = 4.9% for 
simple ¢-based Bonferroni intervals when r = 50, k = 8 anda = 0.71. 
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6.2 t-Based Transformed Bonferroni Intervals 


Table 4 also gives percentage symmetry results for t-based Bonferroni intervals based on 
the log and logit transformations. The results of the table suggest that logit intervals do pro- 
vide more symmetric coverage than the log intervals, when kK is in the range 5 to 8. Thus logit 
intervals might be considered preferable in practice to log intervals from the point of view of 
error rate symmetry. 


7. UNEQUAL CELL PROBABILITIES 


Table 5 presents results on total error rates and error rate symmetry under unequal cell pro- 
babilities for the t-based log and logit transformed Bonferroni procedures, together with results 
for the first order modified Q-H procedure. Results are tabulated for six sets of unequal pro- 


babilities, three for the casek = 5, X = 2,a = 0.5, namely (5, 3, .3), 2(5, 2, .425) and x(5, 
1, .8), (see Section 4.1), and three for the casek = 8, \ = 2,a = 0.71, namely x(8, 3, .25), 
m(8, 2, .35) and 2(8, 1,. 65). For each m vector, the remaining kK — q elements all equal 0.05. 
Results for equiprobable cells are also displayed in Table 5 for comparison. 

It can be seen that deviations from equiprobability do affect total error rates for the 
modified Q-H procedure, particularly when kK = 8. With the first element 7, = 0.65 the total 
error rate of modified Q-H is close to its error rate under equiprobability. For the other two 
cases studied (7; = ma, = .35,and 7, = a) = m3; = 0.25), total error rates are considerably 
lower, closer in fact to the modified Q-H results obtained for the constant design effect case 
(see Thomas 1989). This difference in total error rates occurs because the pattern of cell design 
effects is different for each set of unequal probabilities, though the pattern of generalized 
design effects (the \’s) remains the same (A; = 2 + 2 V3, dj; = 2 — V3/3,j = 2, ..., 7 for 
\ = 2,a = V2/2 = .707). When 7, = 0.65, the cell design effects ared, = 5.7, d; = 1.82, 
is 2iotiery Se 


Table 5 


The Effect of Unequal Cell Probabilities on the Total Error Rates (ER) 
and Percentage Asymmetries (PER) of the Modified Q-H 
and Transformed Bonferroni Procedures; 
r= 50, \ = 2,4 = 5%, m = 10 


Procedures 
Modified Q-H t-based Bonferroni 
(first order) (log) (logit) 
k 1(k,q,0) ER; PERy ERy PERy ER; PERy 
> ee si0.5) a7 iPS) 5.6 (ee) 4.4 62.5 
S 32(0,2,0;425) 1.4 82.1 4.8 Saw 4.6 47.8 
>  Kl5,3;0.0) | Bee 76.7 4.2 Sine 3.9 38.5 
5 equi-prob. 2.0 45.0 4.5 61.1 4.0 48.8 
8 7(8,1,0.65) 27 63.0 6.3 68.3 5.4 55.6 
8 7(8,2,0.35) 0.6 83.3 4.9 58.2 4.4 S152 
8 7(8,3,0.25) 0.7 100 $2 68.2 4.6 63.1 
8 equi-prob. ve) 66.5 6.0 64.0 js 49.0 
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Use of a uniform adjustment factor ( ) will thus seriously underestimate the variance of the 
first estimated cell probability, leading to inflation of the error rate of the modified Q-H pro- 
cedure. That the nominal error rate a = 5% is not exceeded is due to the inherent conser- 
vativeness of modified Q-H intervals in the constant design effect case (see Section 5.3). When 
Ti = hy = "0.955 corresponding design effectsiare d="! =) 23694 p N97 37} ays: 
These are much closer to constant design effects (d; = 2.0,7 = 1, ..., 8) hence the conser- 
vative behaviour of the intervals in this case. It can also be seen from Table 5 that conservative 
ER7’s are associated with highly asymmetric error rates. 

Despite the variation in cell design effects implied by the different probability vectors of 
Table 5, it can be seen that the transformed Bonferroni procedures exhibit very stable perfor- 
mance. Total error rates (for 50 clusters) are close to the nominal rate (a = 5%) for both 
log and logit intervals, and neither exhibits serious asymmetry. Total error rates correspon- 
ding to unequal probabilities do decrease with decreasing r over the ranger = SO tor = 15 
when k = 8 (results not shown). Variations in ER, are not severe, however; when r = 15 
clusters the minimum rate for the cases examined is approximately 2%. 


8. SUMMARY, CONCLUSIONS AND RECOMMENDATIONS 


In the search for procedures that take direct account of the survey design and that provide 
adequate control of error rates and error rate symmetry over a wide range of problem and 
clustering situations, Scheffé intervals based on estimated cell variances must be rejected: the 
chi-squared version of equation (3) on the grounds of poor error control, and the F-based 
version on the grounds of extreme asymmetry. Modifications to Quesenberry-Hurst intervals 
are somewhat conservative, though the version based on the first order Rao-Scott correction 
does provide a viable procedure. For Bonferroni intervals, the benefits of using critical points 
of the ¢-distribution instead of the standard normal are substantial. Even so, intervals based 
on % and its square root provide inadequate control of total error rates, particularly for small 
numbers of clusters when the distribution of # becomes increasingly non-normal. On the other 
hand, t-based Bonferroni intervals using both the log and logit transformations provide good 
control of total error rates and error rate symmetry, and are clearly superior to all other com- 
peting intervals. Both log and logit transformed intervals (t-based) also appear to provide good 
control of error rates and error rate symmetry when the cell probabilities are unequal, differing 
in the cases studied by a ratio (maximum to minimum) of up to sixteen. From the point of view 
of total error rates there is little to choose between the log and logit intervals, though error 
rates for the latter are consistently a little lower. Logit intervals are superior from the point 
of view of symmetry, however. Estimates of confidence interval lengths (detailed results not 
shown) also favour the logit intervals, despite their slightly lower error rates. For example, 
for the equiprobable case with a = 5%, k = 5, \ = 2,a = 0.5 andr = 50, the average 
length of the confidence interval on 7, (expressed as a 95% confidence interval) was .1915 + 
.0014 for the log-based interval, and .1850 + .0014 for the logit-based interval. For the case 
of unequal probabilities, with a = 5%, k = 8, \ = 2,a = 0.71, r = 50, 7, = 0.65 and 
a = 0.05 (see Table 5), 95% confidence intervals for the average lengths of log and logit 
intervals were: for 7, .2865 + .0012 and .2776 + .0011, respectively; for 72, .0806 + .0010 
and .0789 + .0011, respectively. 

Before final recommendations are made, it is necessary to consider possible limitations 
imposed by the design of the Monte Carlo study. A potentially limiting feature is the use of 
a single specific sampling design, namely two-stage cluster sampling with SRS at the second 
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stage, given that practitioners will encounter data collected using a range of survey designs that 
might include stratification and multiple levels of unit selection. For large samples, the rele- 
vant distribution theory requires knowledge only of first and second moments, assuming that 
a suitable central limit theorem applies (see for example Rao and Scott 1981). This study will 
therefore yield valid recommendations for large numbers of clusters, or more generally for 
large numbers of degrees of freedom for variance estimation (Rao and Thomas 1988), as long 
as the covariance matrix V/n and hence the generalized design effects can be appropriately 
modelled. Since the Dirichlet mixture model used in this study yields generalized design effects 
having means and coefficients of variation that are typical of those found in practice, recom- 
mendations based on a large number of clusters or degrees of freedom (fifty or more) can be 
made with confidence. For small to moderate numbers of clusters, quantitative results may 
differ from design to design. Since the basic mechanisms underlying the results exhibited in 
this study, namely increasing non-normality of z for decreasing r plus the inherent conser- 
vativeness of Scheffé-like procedures, will apply in general, it is expected that the qualitative 
trends for the different statistics examined will be generalizable across a wide variety of designs, 
even when the number of clusters is not large. The basic aim of the study has been to identify 
procedures whose control of error rates is robust to variations in the study parameters, namely 
the number of categories, the number of clusters, the strength of clustering, and the skewness 
of the vector of category probabilities. The combination of parameters examined has covered 
much of the range likely to be encountered in practice, so it is reasonable to suggest that the 
robustness exhibited by the log and logit transformed Bonferroni intervals might extend to 
variations in survey design, for moderate numbers of clusters (or degrees of freedom). Further 
research on this question is clearly required. 

Subject to these caveats, ¢-based Bonferroni simultaneous confidence intervals based 
on the logit transformation are recommended for assessing up to k = 12 proportions of 
varying magnitude, under realistic clustering conditions. If conservativeness is deemed to be 
an asset, the first-order modified Quesenberry-Hurst procedure can be safely used. Both pro- 
cedures require only a knowledge of the variances (or design effects) of the estimated cell 
proportions. 
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Logistic Regression Under Complex 
Survey Designs 


JORGE G. MOREL! 


ABSTRACT 


Estimation procedures for obtaining consistent estimators of the parameters of a generalized logistic 
function and of its asymptotic covariance matrix under complex survey designs are presented. A cor- 
rection in the Taylor estimator of the covariance matrix is made to produce a positive definite covariance 
matrix. The correction also reduces the small sample bias. The estimation procedure is first presented 
for cluster sampling and then extended to more complex situations. A Monte Carlo study is conducted 
to examine the small sample properties of F-tests constructed from alternative covariance matrices. The 
maximum likelihood estimation method where the survey design is completely ignored is compared with 
the usual Taylor’s series expansion method and with the modified Taylor procedure. 


KEY WORDS: Pseudo-likelihood; CPLX procedure; Cluster sampling; Adjusted covariance matrix. 


1. INTRODUCTION 


In the last few years a lot of attention has been given to the problems that arise when chi- 
square tests based on the multinomial distribution are applied to data obtained from complex 
sample designs. It has been shown that the effects of stratification and clustering on the chi- 
Square tests may lead to a distortion of nominal significance levels. Holt, Scott and Ewings 
(1980) proposed modified Pearson chi-square statistics tests of goodness-of-fit, homogeneity, 
and independence in two-way contingency tables. Rao and Scott (1981) presented similar tests 
for complex sample surveys. In all these cases, the correction factor requires only the knowl- 
edge of variance estimates (or design effects) for individual cells. Bedrick (1983) derived a cor- 
rection factor for testing the fit of hierarchical log linear models with closed form parameter 
estimates. Rao and Scott (1984) presented more extensive methods of using design effects to 
obtain chi-square tests for complex surveys. They generalized their previous results to multi- 
way tables. Fay (1985) presented the adjustments to the Pearson and likelihood test statistics 
through a jackknife approach. 

The use of the conditional logistic model, Cox (1970), has become increasingly popular in 
the context of complex survey designs. Under suitable conditions, Binder ( 1983), proved the 
asymptotic normality of design-based sampling distribution for a family of parameter 
estimators that cannot be defined explicitly as a function of other statistics from the sample. 
His results are applied to binary logistic models. Further applications to the Canada Health 
Survey are also found in Binder ef al. (1984). 

Chambless and Boyle (1985) derived a general asymptotic distribution theory for stratified 
random samples with a fixed number of strata and increasing stratum sample sizes. Their 
theoretical results were illustrated with logistic regression and discrete proportional hazard- 
smodels. Albert and Lesaffre (1986) discussed the logistic discrimination method for classi- 
fying multivariate observations into one of several populations. They restrict their attention 
to discrimination between qualitatively distinct groups. 


! Jorge G. Morel is Assistant Professor of the Department of Epidemiology and Biostatistics, University of South 
Florida, Tampa, Florida 33612. 
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Extensions to the case where the response consists of a polychotomous variable have been 
done by Bull and Pederson (1987) and Morel (1987). They show, by using Taylor’s series expan- 
sion, that the large sample variance of the beta estimates has the form 


H-V GH; 


where H~! is the covariance matrix that wrongly results from assuming independence and 
multinomial distribution in the response vector, and G is a matrix whose estimation is based 
in the complex survey design. 

More recently, Roberts, Rao and Kumar (1987) showed how to make adjustments that take 
into account the survey design in computing the standard chi-square and the likelihood ratio 
test statistics for logistic regression analysis involving a binary response variable. The 
adjustments are based on certain generalized design effects. Their results can be applied to cases 
where the whole population has been divided into I domains of study, a large sample is obtained 
for each domain, and in each domain a proportion 7;,i = 1,2, ..., J, is to be estimated. It 
is assumed 


= [1 + exp(x; 8°)] ~! exp(x; 6"), i =] 2 aes 


where x; is a kK-vector of known constants derived from the i-th domain and 8° is a k-vector 
of unknown parameters. This procedure may be most useful when only the summary table 
of counts and variance adjustment factors are available, instead of the complete data set. 

In this paper an estimation procedure is presented for obtaining consistent estimators of 
the parameter vector of a generalized logistic model and its asymptotic covariance matrix when 
a complex sampling design is employed. The resulting estimated covariance matrix is always 
positive definite and asymptotically equivalent to the one obtained from Taylor’s series expan- 
sion. A correction for reducing the small sample bias in the estimated covariance matrix is also 
introduced. It is shown, via a Monte Carlo study, that this correction levels off the inflated 
Type I error that arises from ignoring the complex survey, faster than the Taylor’s series expan- 
sion. In this sense the correction proposed here produces, for small samples, results that are 
superior to the usual delta-method. 

The new procedure will be termed, henceforth, the CPLX procedure, or simply CPLX. The 
maximum likelihood estimation method and the Taylor’s series expansion method will be 
termed MLE and TAYLOR, respectively. The CPLX procedure has been incorporated into 
PC CARP, apersonal computer program for variance estimation with large scale surveys, see 
Schnell et al. (1988). 


2. LOGISTIC REGRESSION WITH CLUSTER SAMPLING 


Consider first single-stage cluster sampling where 7 clusters or primary sampling units 
are taken with known probabilities with replacement from a finite population or without 
replacement from a very large population. Let m; represent the size of the j-th cluster, = 1, 
Ds ianks Ms AUC IEU Neyer hg ee ey Ay denote (d + 1) dimensional classification vectors. The 
vector y% consists entirely of zeros except for position r which will contain a one if the ¢th 
unit selected from the j-th cluster falls in the r-th category. Let xj, be a k-dimensional row 
vector of explanatory variables associated with the f-th unit selected from the j-th cluster. 
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Minetmroiecacheje——) 2. «sand each (= 1224... m 


;, the expectation of the r-th 
element of y% is determined by a logistic relationship as 


d 
c= E(y jn} = [1 ie 2 exp (Xj 85) | a exp (Xj¢ Br) i ey ers | 
s=1 


d 
caida) DE Fis Tos Ueto i: Qx1) 


Because the expected value function is nonlinear in the parameter vector 8° = (89’, 
83’, ...,89’)’, it is necessary to use nonlinear estimation methods. Define the pseudo log- 
likelihood L,,(8) as 


L,(B) = Y) )) wj(log xh)’ yh, (2.2) 


where x7, = (jel, --+> Te,a4+1) and w, is the sampling weight for the th sampling unit. 
This function can be viewed as a weighted log likelihood function, where the weights are the 
sampling weights and the y%’s are distributed as multinomial random variables. If the 
sampling weights are all one, then (2.2) becomes the log-likelihood function under the assump- 
tion that the y%’s are independently multinomially distributed. 

Let Beseriae be the estimator of 8° that maximizes (2.2). This estimator is a solution to the 
system of equations 


n 


wy 
YL GG, x4) [Diae(ah)]-' Oh at) = 0, @.3) 
f= 


where 


G(B, Xj) = [Uaxa> 04 x1) @xji] A(xj2), 


> 
Views 
3 
ey, 
| 


= Diag(ap) — afo(aje)’. 


and ® denotes the Kronecker product. 


The asymptotic normality of Beseupo can be proved by defining the parameters of interest 
implicity as in (2.2) and then by extending the results given in Binder (1983). An alternative 
approach can be derived by making use of the pseudo-likelihood assumption and Proposition 1 
in Dale (1986). Binder and Dale both provide the necessary regularity conditions. 


As n increases, 


Yn(Brseuvo — 8°) = Vn[H,(6°)|' Un(8°) 


Nax(0, lim [H, (8°) =! G,[H, (6°) |=!) (2.4) 


n—-o 
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where, 


x 
& 


= 

a 
Ter) 

= 
II 


w; A( mje) @XjeXje; 


nS 
ll 
_ 
o 
lI 
_ 


n my 

U,(8°) os yi wi (Vie a 70) Qxje, 
j=1 1 
n my 

G, =) w7 Var (Vie) @xjeXje, 
j=l ¢=1 


Yje and aj, are the vectors y# and 1, without their last elements, respectively and Na, 
denotes a dk-multivariate normal distribution. 

Nelder and Wedderburn (1972) have shown that under binomial assumption, the pseudo 
log-likelihood function (2.2) can be solved by an iterative weighted least-squares procedure. 
Haberman (1974, p.48) shows that under regularity conditions a modified Newton-Raphson 
converges to the maximum likelihood estimator for the multinomial case. His proof does not 
depend on the existence of any consistent estimator of p° which allows the iterative algorithm 
to be initialized at B = 0. Jennrich and Moore (1975) proved that when the multinomial 
assumption holds, the common Gauss-Newton algorithm for finding the maximum likelihood 
estimator of ee becomes the Newton-Raphson algorithm. Because of this equivalence of those 
algorithms and because a modified Newton-Raphson procedure always converge, we have 
adopted the modified Gauss-Newton algorithm described by Gallant (1987, p.318). 

CPLX< first finds Bpseupo using an iterative procedure in which the estimate of Be at the 
q-th step is 


Big,itg)) = Btq-1,i(q-1)) 


+ (0.5) [Hn (Biq—1,«g-11) | Un(Bta-1, «a-1)1) (2.5) 
where i(q) is a nonnegative integer such that 
L, (Bigg) > Ln(Btg-1,0q=001)- (2.6) 


The modification of the iteration algorithm provided by i(q) guarantees the convergence of 
the procedure. The iteration is initiated by setting 8,9) = 0. The algorithm is declared to have 
converged when the condition 


L,(Biaia1) — Ln(Btg—1,0q-11) 


ss (2.7) 
ILn(Biqicay1) | rt it 


is satisfied, where e can be 107°. 
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Observe that a consistent estimator of H,,(8°) is H, (Bpseuvo) and a distribution free 
estimator of G,, is 


=(n—-1)"'n)) (d; — d) (dj — d)’, (2.8) 
j=1 


where 


mt; 
=e 5 (Vie aa Te) ) @ Xjes 


and d’=n7) >? j=1 4;. If within each cluster, the y%’s are independent and identically 
distributed according £6 a multinomial random vector saith parameters (77, 1), then it can be 
easily shown that the expectation of G,, is precisely H,,(8°). In practice the z's in (2.8) are 
replaced with ij, where 77, is defined as in (2.1) with Bpseupo Substituted by 8°, and a small 
correction is applied to obtain the estimator 


n 


= (n* — ky! (n* — 1) (n- 1)“ YD Chey Narnia (2.9) 


where 

Lg 

d7— yD i(Vje — Te) @ Xj, 
t=1 

x n 

d=n"')y) d,2and nt ye m; 

tel 
The factor 


(n®* — k)—) (n* — 1) 2» — 1) 


reduces to (n — k) —'n if each cluster contains exactly one element. The factor (n — k)~ 

is the degrees of freedom correction applied to the residual mean square for ordinary least 
squares in which k parameters are estimated. The quantity in (2.9) is well defined for two or 
more clusters and the factor (n* — k)~! (n* — 1) should reduce the small sample bias 
associated with using the estimated function to calculate deviations. Therefore, a consistent 
estimator of the asymptotic covariance matrix of Beseupo under the cluster sampling design is 


A, = [Hn(Bpseuvo) |~! Gal Hn(Bpseupo) |~! (2.10) 


which can be used to test any hypothesis of the form Ho: C p° = 6*. Under the null 
hypothesis, by Moore (1977) 


(CBpsruvo — 6*)’ [C A, C’|~' (CBpszuvo — 5*) (2.11) 
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converges in law to a chi-square distribution with v = rank (C A,, C’) degrees of freedom. 
Here, [CA , C’]~! is any generalized inverse of C A, C’. 

The sums of squares and products matrix used in the construction of G, is based on n 
observations, where 7 is the number of clusters. By analogy to the Hotelling T’ statistic, it is 
natural to adjust for degrees of freedom by multiplying (2.11) by the ratio 


pai fame 5 (2.12) 
v(n — 1) 

to obtain an approximate F statistic with v and n — v degrees of freedom. In our case, this 
adjustment has the disadvantage that v may exceed 7 in a sample with a small number of clusters 
but a large number of individual elements. 

The covariance matrix constructed as if the elemental observations are a simple random 
sample is biased, but it can be used to make a small sample adjustment in the estimated 
covariance matrix. One might view the usual small sample degrees-of-freedom adjustment 
as the operation of adding to an initial estimator of the covariance matrix the quantity 
(n= vy V, where V is also an estimator of the covariance matrix. In the usual case, V 
is also the initial estimator. In our case, we make the adjustment using the covariance matrix 
based on the elements as the second V. In our case, the use of the elemental covariance matrix 
has the advantage that the resulting sum is always positive definite. The adjustment is a func- 
tion of the number of parameter estimated, dk. The adjustment is 


(Lyifin > 3dks— 2 

A, = A, + (n — dk)~' (dk — 1) ¥*[Hn(Bpseupo) |~!s (2.13) 
Cyt tes 00k —F2 

A, = A, + 0.5 y* [Hn (Gpseupo) |~}5 (2.14) 


where y* = max(1,tr{ H,(Bpseupo) 1 ~! G_}/dk). The upper bound of 0.5 for correction in 
(2.14) is arbitrary. Then, an approximate F-test with v and n — v degrees of freedom is obtained 
by substituting A,, for A,, in (2.11) and dividing the resulting quadratic form by v. In practice, 
the approximate degrees of freedom can be taken to be v and infinity. 


3. A MONTE CARLO STUDY 


In this section a Monte Carlo study is conducted to examine properties of F-Tests (2.11) 
involving model parameters. Data are generated under two different sampling schemes that 
correspond to single-stage cluster sampling where the primary units all have the same sampling 
weight and are taken from an infinite population. In the first sampling scheme all the elements 
within the cluster have the same explanatory vector x and therefore, the same conditional mean 
(2.1). This is the case where the logistic regression becomes weighted in the sense of several 
responses y’s with the same covariate vector x. Different degrees of intra-class correlation are 
induced among the y’s belonging to the same cluster. 
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The second sampling scheme, unlike the first, places different vectors of covariates for dif- 
ferent subjects within the cluster. The conditional mean (2.1) is also satisfied and different 
degrees of intra-class correlation are controlled. The effect of the intra-class correlation is 
studied for both sampling schemes under three different estimation procedures: MLE where 
the clustering effect is completely ignored, TAYLOR where the large sample covariance matrix 
(2.10) is used, and CPLX where the adjusted covariance matrix (2.13-2.14) is employed. These 
last two procedures, for large samples, are asymptotically equivalent. For small samples CPLX 
performs better than TAYLOR. 


3.1 Sampling Scheme I 


Suppose that x,, x», ..., x, are k-dimensional independent and identically distributed 
normal random vectors with vector mean yp and covariance matrix L. For each j, 
j = 1,2, ..., m, suppose that given x;, the random vectors y%, y%, ..., Yim; are indepen- 
dent and identically distributed multinomial random vectors, with parameters (7, 1), where 
m* satisfies the logistic function (2.1) evaluated at the true parameter vector Bo and atx = Xx;. 
eee 0 2,0 Uj, mj be a set of independent and identically distributed uniform (0,1) 
random variables. For a known and fixed ¢, 0 < ¢ < 1, define 


Yh = yj if Ups (3.1.1) 
and 


Yh = yh if Up > $, (3.1.2) 


It can be shown that within the j-th cluster, 


E(yp) = af; (3.1.3) 
Cov(yt, yt) = A(xf) if e= 4, (3.1.4) 

and 
Cov(yf, yi) = SF A(af) if wt. (331¢5) 


Therefore, given x;, the random vector ¢; = oy y# does not have a multinomial distribu- 
tion. Instead 


E(m;' t)) = af (3.1.6) 
and 


Var(m;'t) = [1 + ¢ (m,; — 1)] mj! A(x¥), ‘ehls) 
( J i) ( J J iy 
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where e represents the intra-cluster correlation. Furthermore, if the m,’s are constant, /.e., 
m; = m, the factor@ = [1 + ¢?(m — 1)] corresponds to the design effect defined by Kish 
(1965, p.258). An estimate of the design effect ¢ is 


= 


$ = (dk)! | Ee nih) wr}, (3.1.8) 


where @;;) and A“) represent the (i,i)-th elements of A, in (2.13)-(2.14) and 
[Hn (8pseupo) | —1, respectively, and w is the average of the sampling weights for the entire 
sample. 

Under this sampling scheme, data (x;, yf), j = 1, 2, ...,m,£= 1, 2, ..., m, were 
generated with k = 4, d = 3, m = 21, and parameters 


ped Bice 0 ahh (3.1.9) 

Z = Diag(0, 25, 25, 25), (3.1.10) 

BY = (—0.3, —0.1, 0.1, 0.2), (3.1.11) 

Bo = (0.2, —0.2, —0.2, 0.1), (3.1.12) 
and 

63 = (015.030 0.30.1), (3.1.13) 


Based on (3.1.9)-(3.1.13), 1000 sets of samples with n clusters of size m, were generated 
according to (3.1.1)-(3.1.2) for different values of n, ¢*, and d. The estimated Type I errors 
obtained from comparing the F-tests of Hp: B = p° against F’(12, oo; 0.05) = 1.753 were 
computed under the three different estimation procedures: MLE, CPLX and TAYLOR. A 
measure of the distortion of the estimated Type I errors relative to the nominal 0.05 is the 
relative bias which is defined as 


(0.05)! | Estimated Type I error - 0.05 |. (3.1.14) 


Relative biases of the estimated Type I errors are reported in Table 3.1. For data gener- 
ated with no intra-class correlation, (¢7 = 0) the MLE procedure, as it is expected, provides 
small relative bias of the estimated nominal 5% level. CPLX produces in this case relative 
biases slightly greater than MLE.This is the penalty of estimating extra parameters in 
(2.13-2.14). 

The MLE procedure shows a strong distortion of the estimated Type I error when a positive 
intra-class correlation is present. This distortion increases as the intra-class correlation ¢? gets 
bigger. In the case where {7 = 0.15 (@ = 4) the relative bias of the estimated Type I error is 
about 18 indicating an inflated Type I error of about 95%. For the CPLX procedure, the 
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Table 3.1 


Relative Bias of the Estimated Type I Error for the F-test of Ho: B = B° 
with nominal 0.05 Level under Sampling Scheme I ~ ~~ 
A SN a A Pa a a cell da 


Procedure 
n ie ¢ MLE CPLX TAYLOR 
20 0.00 1 0.24 0.60 16.42 
20 0.05 2 9.66 3.68 17.06 
20 0.10 3 15.24 3.98 17.44 
20 0.15 4 17.74 4.00 17.70 
30 0.00 1 0.08 0.06 12.82 
30 0.05 2 9.84 1.20 13.74 
30 0.10 3 15.52 1.76 14,22 
30 0.15 “ 17.74 1.86 14.68 
40 0.00 1 0.04 0.32 9.66 
40 0.05 2 9.98 0.82 9.62 
40 0.10 3 16.20 1.02 11.66 
40 0.15 4 17.74 1.80 11.66 
50 0.00 1 0.06 0.50 7.40 
50 0.05 2 9.76 1.44 8.38 
50 0.10 3 16.00 1.96 9.32 
50 0.15 4 17.80 2.20 9.70 
100 0.00 1 0.06 0.90 2.68 
100 0.05 2 10.02 1.66 3.90 
100 0.10 3 16.26 2.06 4.70 
100 0.15 4 17.78 2.24 5.10 
200 0.00 1 0.02 0.74 1.28 
200 0.05 2 10.46 1.00 1.64 
200 0.10 3 16.30 0.88 1.88 
200 0.15 “ 18.00 1.52 ZZ 
400 0.00 1 0.02 0.44 0.70 
400 0.05 2 10.14 0.66 0.90 
400 0.10 3 16.56 0.64 1.00 
400 0.15 4 17.86 0.56 0.84 
800 0.00 1 0.08 0.32 0.40 
800 0.05 2 10.36 0.22 0.36 
800 0.10 3 16.04 0.68 0.80 
800 0.15 4 18.12 0.50 0.54 
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relative bias decreases as the sample size increases from = 20 to the cutting point of correc- 
tion (2.14) which is 34 in this case. Then it slightly increases as the sample size approaches 
n = 100and then decreases as the sample size keeps getting bigger. This pattern will be observed 
throughout the whole simulation. It represents the effect of the correction (2.13-2.14) in small 
samples. 

The Taylor procedure has large relative biases when the sample sizes are small. It varies from 
17 to 7 for sample sizes between n = 20 andn = 50. For large samples both methods CPLX 
and TAYLOR, provide as expected, similar results. In general, the CPLX shows relative biases 
smaller than the TAYLOR method. 

If the F statistics used for testing Hp: B = p° are multiplied by the number of parameters 
being tested, the resulting statistic is distributed as a chi-square random variable with 12 degrees 
of freedom. The Monte Carlo means and variances for these chi-square statistics are presented 
in Table 3.2. 

As expected, the MLE method produces means and variances around 12 and 24, respec- 
tively, when the design effect ¢ is one. CPLX has in this case means around 12 with greater 
variances that decrease when the sample size gets bigger. However, in the presence of any intra- 
class correlation, the means and variances under MLE are too large, while CPLX shows con- 
sistency with the asymptotic theory and the correction introduced in (2.13-2.14). The TAYLOR 
method has extremely high variances when the sample size is small. A possible explanation 
for this is that in some replications of the simulation the covariance matrix (2.10) was ill- con- 
ditioned producing very large quadratic forms for (2.11). This problem attenuates when the 
sample size is bigger. Both methods, CPLX and TAYLOR, become asymptotic equivalent for 
large samples. 

Monte Carlo properties for the estimator (3.1.8) of the design effect are presented in Table 
3.3 for both CPLX and TAYLOR methods. The CPLX procedure shows smaller biases and 
slightly large standard errors. Both methods perform fairly well. 

For each category r,r = 1, 2, 3 and each covariate s,s = 1, 2, 3, 4, ‘‘¢’’ statistics for the 
individual coefficient estimates were also computed as 


“t”? = [Var LPeshipd MCnt ae bis)s Gili} 


The twelve ‘‘?’’statistics provided by the CPLX estimation procedure were grouped together 
and the simulated percentiles were computed. Similar computations were performed for the 
MLE ‘‘?’” statistics. Consequently, for each run the percentiles are based on 12,000 ‘‘?’’ values. 
Once these percentiles were calculated, the relative biases were estimated as 


(Standard Normal Percentile) ~!| Estimated Percentile — Standard Normal Percentile |. 
(3.1.16) 


The results of the relative bias for the estimated 5th and 95th percentiles for the ‘‘?’’ statistics 
are presented in Table 3.4 for both MLE and CPLX procedures. Under the MLE it is expected 
that these relative biases be close to 6°° — 1. This is true because the ‘‘?’’ statistics under 
MLE are inflated by the factor ¢°°. This is clearly seen in Table 3.4 under the two columns 
for the MLE percentiles. The CPLX procedure has satisfactory relative biases for small sample. 
These biases become negligible, as expected, when the sample sizes get bigger. 
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Table 3.2 


Monte Carlo Properties of the Chi-square Statistic of Ao: B = pe 


under Sampling Scheme I 


—_—_ rrr 


213 


Procedure 
MLE CPLX TAYLOR 
n 2 d Mean Variance Mean Variance Mean Variance 


20 0.00 1 | bea Papp? 
20 0.05 2 Pay 134.3 
20 0.10 3 34.2 239.9 
20 0.15 4 43.8 403.2 
30 0.00 1 11.8 25.1 
30 0.05 2 23.8 121.4 
30 0.10 3 35.8 268.1 
30 0.15 4 46.7 450.1 
40 0.00 1 | Pane 24.3 
40 0.05 2 Zu 96.5 
40 0.10 3 35.4 247.7 
40 0.15 4 46.2 428.9 
50 0.00 1 i D5%5 
50 0.05 2 23.9 112.5 
50 0.10 3 35.8 231.0 
50 0.15 4 46.7 424.0 
100 0.00 1 2 23.6 
100 0.05 2 23.9 102.6 
100 0.10 3 36.5 233.9 
100 O55 4 47.5 350.4 
200 0.00 1 Lies 24.1 
200 0.05 Zz 23.9 O59 
200 0.10 3 3551 194.1 
200 0.15 4 48.0 399.6 
400 0.00 1 11,9 24.9 
400 0.05 2 24.1 96.6 
400 0.10 3 36.9 208.5 
400 0.15 4 47.3 390.7 
800 0.00 1 11.9 24.0 
800 0.05 2 24.0 9973 
800 0.10 3 36.4 239°3 
800 0.15 4 48.7 396.3 


12x103 

8x10* 
12x10 
19x104 


702.3 
691.6 
12x10? 
16x10? 


268.3 
201.4 
340.4 
331.4 


140.8 
153.6 
195.8 
234.6 


55.0 
62.1 
75.8 
70.6 


38.2 
39% 
37.4 
42.7 


31.3 
31% 
31.4 
34.0 


one 
28.2 
mI 
PA be, 


——————————————————— en 
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Table 3.3 
Monte Carlo Properties of ¢ under Sampling Scheme I 
Procedure 
GPL TAYLOR 
Rel Rel. 
n 0 o Bias SE Bias SE: 
20 0.00 1 0.28 0.23 0.23 0.22 
20 0.05 2 0.01 0.63 0.35 0.48 
20 0.10 3 0.07 0.93 0.40 0.70 
20 0.15 4 0.15 115 0.46 0.85 
30 0.00 1 0.33 0.22 0.17 0.20 
30 0.05 2 0.14 0.62 0.25 0.47 
30 0.10 3 0.08 0.88 0.30 0.66 
30 0.15 4 0.04 1.18 0.33 0.90 
40 0.00 1 0.26 0.18 0.14 0.18 
40 0.05 2 0.14 0.53 0.19 0.42 
40 0.10 3 0.10 0.83 0.22 0.67 
40 0.15 4 0.07 Te3 0.25 0.91 
50 0.00 1 0.18 0.18 0.11 0.17 
50 0.05 2 0.09 0.48 0.16 0.41 
50 0.10 3 0.07 0.75 0.18 0.64 
50 0.15 4 0.04 0.97 0.21 0.83 
100 0.00 1 0.07 0.13 0.06 0.13 
100 0.05 M3 0.04 0.34 0.08 0.32 
100 0.10 3 0.01 0.54 0.10 0.51 
100 0.15 4 0.01 0.69 0.11 0.65 
200 0.00 1 0.03 0.10 0.03 0.09 
200 0.05 2 0.02 0.25 0.04 0.24 
200 0.10 3] 0.01 0.38 0.05 0.36 
200 0.15 4 0.01 0.49 0.05 0.48 
400 0.00 l 0.01 0.07 0.01 0.07 
400 0.05 2 0.01 0.19 0.02 0.19 
400 0.10 3 0.00 0.27 0.02 0.27 
400 0.15 4 0.00 0.37 0.02 0.37 
800 0.00 1 0.01 0.05 0.01 0.05 
800 0.05 2 0.00 0.13 0.01 0.13 
800 0.10 3 0.00 0.19 0.01 0.18 
800 0.15 4 0.00 0.24 0.01 0.24 
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Table 3.4 


Relative Bias of the Estimated 5th and 95th Percentiles for the ‘‘t’’ Statistics 
for the Coefficient Estimates under Sampling Scheme I 
Bo SME ET SLO 2a Soe LR SB RE a TASS SE is Me A oe 


Procedure 
MLE CPLX 
Percentile Percentile 
n fe aesasey | 5th 95th Sth 95th 
20 0.00 0.00 0.02 0.00 0.10 0.09 
20 0.05 0.41 0.40 0.38 0.04 0.02 
20 0.10 0.73 0.68 0.65 0.07 0.04 
20 0.15 1.00 0.84 0.79 0.07 0.04 
30 0.00 0.00 0.00 0.02 0.10 0.09 
30 0.05 0.41 0.43 0.38 0.01 0.02 
30 0.10 0.73 0.73 0.70 0.02 0.01 
30 0.15 1.00 0.97 0.91 0.01 0.01 
40 0.00 0.00 0.01 0.01 0.07 0.08 
40 0.05 0.41 0.38 0.41 0.03 0.02 
40 0.10 0.73 0.70 0.72 0.03 0.01 
40 0.15 1.00 0.96 0.93 0.01 0.03 
50 0.00 0.00 0.01 0.01 0.05 0.07 
50 0.05 0.41 0.43 0.40 0.00 0.01 
50 0.10 0.73 0.71 0.70 0.01 0.00 
50 0.15 1.00 0.97 0.96 0.02 0.01 
100 0.00 0.00 0.00 0.02 0.01 0.00 
100 0.05 0.41 0.42 0.42 0.02 0.01 
100 0.10 0.73 0.71 0.74 0.01 0.03 
100 0.15 1.00 1.03 0.99 0.04 0.04 
200 0.00 0.00 0.01 0.01 0.00 0.00 
200 0.05 0.41 0.42 0.43 0.01 0.01 
200 0.10 0.73 0.71 0.72 0.01 0.01 
200 0.15 1.00 1.00 1.00 0.02 0.02 
400 0.00 0.00 0.01 0.01 0.01 0.01 
400 0.05 0.41 0.39 0.40 0.01 0.00 
400 0.10 0.73 0.76 Oi, 0.03 0.04 
400 0.15 1.00 1.02 0.89 0.02 0.00 
800 0.00 0.00 0.00 0.01 0.00 0.01 
800 0.05 0.41 0.43 0.44 0.01 0.02 
800 0.10 0.73 0.76 0.70 0.02 0.01 


800 0.15 1.00 1.07 1.04 0.04 0.02 
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3.2 Sampling Scheme II 


Let x,,X2, ...,X, be a set of k-dimensional independent and identically distributed normal 
random vectors with vector mean w» and covariance matrix Lg. These vectors x represent 
cluster means for the explanatory variables in ye logistic function (2.1). Suppose that for the 
FAtclusteney = 1,2, ag iB xo, xi, oY, jm; are independent and papel distributed 
normal random vectors Pail vector mean x; and covatiatcs matrix Ly. Given x jo Uke 
..., Mm, the (d + 1)-dimensional rancor ae yen has a multinomial distribution with 
Ae a. (x, 1), where the elements of bs m je Satisfy the logistic function (2.1) eyanuate at 
the true parameter vector p° and atx = Aye . Furthermore, suppose that given the x xve *s, the 
yy’s are independent. 

Let Uj, Uj2, ..., Uj,m, be m; independent and identically distributed uniform (0,1) random 
variables that are also jointly independent from the xin s and from the y's . Let ¢ be a fixed 
and known number, 0 < ¢ < 1. Then define (xj, y#), 2 = 1, 2, ..., m; in the following 
way: 


(Xj YR) = (Xjo V9o) if Ue = f (3.200) 
and 


(Xin VR) = (fo Vj) if Uje > F- (3.213) 


Observe that within each cluster, the x;p’s all have the same vector of conditional means x; 
and that the covariance matrix between xj, and x;, is Ly if @ = t and ie Ly Otherwise. Also, 
note that the conditional mean of each y% is the logistic function (2.1) evaluated at Ba and 
xX = X;,, and that the vectors (xj, yf), £ = 1,2, ..., mj, exhibit an intra-class correlation of 
¢? and an approximate design effect of ¢ = [1 + ¢? (m — 1)] when all the m;’S are 
constant. 

Data (xj, yh), J = 1, 2, ..., mn, 2 = 1, 2, ..., mj, were generated under this cluster 
sampling scheme with k=4, d=3, and parameters 


zu = (1, -6,4, 8)’, (3.2.3) 

L, = Diag(0, 25, 25, 49), (3.2.4) 

Lw = Diag(0, 25, 36,36), (3.2.5) 

BY = (0.30, —0.05, —0.06, 0.08), (3.2.6) 

83 = (0.06, —0.08, —0.10, 0.07), (3.2.7) 
and 

63 = (0.70, —0.08, —0.10, 0.11), (3.2.8) 
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Based on (3.2.3)-(3.2.8), 1000 sets of samples with n clusters of size m; = m = 6, were 
generated according to (3.2.1)-(3.2.2) for different values of n, ¢? and ¢. The relative biases 
defined in (3.1.14) of the estimated Type I errors from comparing the F-tests of Hp: 8 = p° 
against F(12, oo; 0.05) = 1.753 are presented in Table 3.5 under three different estimation 
techniques: MLE, CPLX and TAYLOR. 

In the presence of intra-class correlation, there is a strong distortion of the Type I error for 
MLE even in the case where ¢” is relatively small (¢? = 0.2) for cluster size m = 6. This 
distortion is reflected in the relative bias which ranges from approximately 7 to 18. These values 
indicate inflated Type I errors between 40% and 95%. The CPLX procedure provides satisfac- 
tory relative biases even for the case of small samples. The TAYLOR procedure has too high 
values for small samples. It becomes equivalent to CPLX for large samples. One more time 
CPLX seems to be superior to TAYLOR when the sample size is small. 


Table 3.5 


Relative Bias of the Estimated Type I Error for the F-test of Ho: B= p° 
with Nominal 0.05 Level under Sampling Scheme II 


Procedure 
n (2 ty MLE CPLX TAYLOR 
20 0.0 1 0.54 0.46 13.52 
20 0.2 2 7.30 0.46 12.96 
20 0.4 3 13.70 0.68 13.96 
20 0.6 4 17.08 0.60 14.72 
30 0.0 1 0.28 0.78 7.78 
30 0.2 2 8.72 0.72 8.16 
30 0.4 3 14.84 0.72 9.32 
30 0.6 4 17.50 0.82 9.23 
40 0.0 1 0.36 0.56 5.16 
40 0.2 2 9.28 0.56 5.76 
40 0.4 3 15.38 0.64 5.84 
40 0.6 4 17.76 0.70 5.80 
50 0.0 1 0.44 0.56 3.44 
50 0.2 2 9.34 0.08 4.86 
50 0.4 3 15.48 0.38 4.36 
50 0.6 4 17.56 0.46 4.16 
100 0.0 1 0.16 0.04 1.26 
100 0.2 2 9.46 0.26 1.46 
100 0.4 3 15.94 0.44 2.00 
100 0.6 4 18.16 0.14 1.46 
200 0.0 1 0.10 0.26 0.76 
200 0.2 Dy 10.20 0.34 0.82 
200 0.4 3 16.22 0.02 0.48 
200 0.6 4 18.06 0.06 0.52 
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Table 3.6 


Monte Carlo Properties of the Chi-square Statistic of Ho: B = p° 
under Sampling Scheme II 


Procedure 
MLE CPRLX. TAYLOR 
n fe f Mean Variance Mean Variance Mean Variance 
20 0.0 1 1143 18.9 10.2 19.7 40.5 15x10? 
20 0.2 % 20.3 62.8 10.5 21.4 39.2 11x10? 
20 0.4 3 28.3 106.4 10.5 18.4 111.3 42x10° 
20 0.6 4 S5nk 152.6 10.3 18.2 11x10? 50x109 
30 0.0 1 11.6 21.6 9.4 16.3 22.0 147.3 
30 0.2 2 21.8 fee? 4 9.9 i i ZZal 161.2 
30 0.4 3 30.4 117.6 9.8 16.5 24.3 224.6 
30 0.6 4 39.3 191.0 9.5 14.5 24x10? 60x 108 
40 0.0 1 11.6 4 9.9 19.4 18.1 86.7 
40 0.2 p) 22.4 76.5 10.4 18.3 18.9 80.8 
40 0.4 3 31.8 | Fa 102 17.8 19.2 90.4 
40 0.6 4 41.4 PORE) 10.1 16.9 19.3 104.4 
50 0.0 1 11:5 19.9 10.6 20.0 16.1 56.9 
50 0.2 2 PBI 80.6 11.4 23.9 | 70.9 
50 0.4 3 3265 160.1 ial 22.9 17.4 dadeh 
50 0.6 4 41.7 262.3 10.7 19.7 17.0 63.8 
100 0.0 1 11.8 PAS 11.8 PEI 13.9 36.2 
100 0.2 2 22.9 87.3 11.9 27.0 14.0 38.5 
100 0.4 3 34.7 191.8 1223 27.9 14.4 40.7 
100 0.6 4 45.1 PAS TP 12.0 25.0 14.1 ae 
200 0.0 1 12.0 23.8 12.1 26.3 13.0 30.3 
200 0.2 2 24.0 88.6 12.4 25.9 Lan 30.0 
200 0.4 a 34.5 175.2 12.0 2303 12.8 27.0 
200 0.6 4 46.8 320.0 lez 24.0 13.0 27.9 


Monte Carlo properties of the chi-square statistics of Hp: B = p° (chi-square = 12 x F) 
are presented in Table 3.6 for the three estimation procedures under study. CPLX shows means 
and variances slightly below 12 and 24, respectively, when the sample sizes are small. This 
underestimation vanishes when the sample size increases. The TAYLOR procedure has too 
large means and variances when the sample size is small. For instance, for - = 10.6. the 
variance is in the order of billions when n is 30 or less. For large samples, both CPLX and 
TAYLOR, seem to provide similar results. The MLE method has acceptable results only when 
¢? = 0.00. Otherwise the estimated mean and variances are too large. 
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Table 3.7 
Monte Carlo Properties of ¢ under Sampling Scheme II 
Procedure 
CPLX TAYLOR 
Rel. Rel. 
n ‘G4 to) Bias S.E. Bias S.E. 
20 0.0 1 0.48 0.22 0.04 0.20 
20 0.2 s 0.16 0.53 0.26 0.42 
20 0.4 > 0.05 0.87 0.34 0.72 
20 0.6 4 0.01 1.24 0.39 1.03 
30 0.0 1 0.49 0.18 0.02 0.16 
30 0.2 2 0.25 0.48 0.19 0.40 
30 0.4 3 0.19 0.84 0.24 0.69 
30 0.6 4 0.16 1.12 0.27 0.94 
40 0.0 1 0.38 0.16 0.02 0.14 
40 0.2 2 0.22 0.45 0.14 0.38 
40 0.4 3 0.16 0.70 0.20 0.60 
40 0.6 4 0.16 0.98 0.19 0.86 
50 0.0 1 0.27 0.14 0.02 0.13 
50 0.2 3 0.15 0.42 0.12 0.37 
50 0.4 3 0.12 0.67 0.15 0.60 
50 0.6 4 0.11 0.89 0.16 0.81 
100 0.0 1 0.12 0.10 0.01 0.10 
100 0.2 Z 0.06 0.32 0.07 0.31 
100 0.4 3 0.05 0.50 0.07 0.48 
100 0.6 4 0.06 0.59 0.07 0.57 
200 0.0 1 0.05 0.07 0.01 0.07 
200 0.2 2 0.03 0.24 0.03 0.23 
200 0.4 3 0.02 0.34 0.04 0.33 
200 0.6 4 0.02 0.40 0.03 0.40 


Monte Carlo properties for the estimator of the design effect proposed in (3.1.8) are 
presented in Table 3.7 under the CPLX and TAYLOR procedures. The TAYLOR procedure 
seems to perform slightly better than CPLX for small samples. Both procedures, in general, 
provide reasonable values. They seem to be equivalent for large samples. 
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Table 3.8 


Relative Bias of the Estimated 5th and 95th Percentiles for the ‘‘?’’ Statistics 
for the Coefficient Estimates under Sampling Scheme II 


Procedure 
MLE CPLX 
Percentile Percentile 
n 2 porn——I Sth 95th Sth 95th 
20 0.0 0.00 0.01 0.00 0.15 0.18 
20 02 0.41 0.37 0.32 0.06 0.09 
20 0.4 0.73 0.63 0.57 0.02 0.05 
20 0.6 1.00 0.79 0.74 0.05 0.05 
30 0.0 0.00 0.02 0.00 0.15 0.16 
30 0.2 0.41 0.39 0.38 0.10 0.10 
30 0.4 0.73 0.68 0.63 0.07 0.08 
30 0.6 1.00 0.91 0.86 0.05 0.07 
40 0.0 0.00 0.01 0.00 0.12 0.15 
40 0.2 0.41 0.39 0.40 0.10 0.06 
40 0.4 0.73 0.65 0.60 0.07 0.09 
40 0.6 1.00 0.99 0.89 0.04 0.05 
50 0.0 0.00 0.01 0.01 0.10 0.10 
50 0.2 0.41 0.39 0.40 0.05 0.04 
50 0.4 0.73 0.73 OW? 0.02 0.01 
50 0.6 1.00 1.00 0.95 0.00 0.01 
100 0.0 0.00 0.01 0.01 0.04 0.05 
100 Oe 0.41 0.40 0.37 0.02 0.02 
100 0.4 0.73 0.72 Ons 0.00 0.00 
100 0.6 1.00 1.00 1.02 0.01 0.02 
200 0.0 0.00 0.02 0.01 0.00 0.01 
200 0.2 0.41 0.40 0.45 0.01 0.02 
200 0.4 0.73 O71 0.68 0.01 0.01 


200 0.6 1.00 1.03 0.95 0.02 0.02 


The relative biases (3.1.16) of the Sth and 95th percentiles of the ‘‘?’’ statistics (3.1.15) are 
presented in Table 3.8 under the MLE and CPLX procedures. MLE has a relative bias, as 
expected, close to zero in the absence of intra-class correlation. This bias increases when the 
¢? gets bigger. On the other hand, CPLX has small relative bias in general and for large sample 
this bias becomes negligible. 
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4. EXTENSION TO STRATIFIED SAMPLING AND 
MORE COMPLEX DESIGNS 


A generalization of CPLX procedure to stratified sampling can be done as follows. Sup- 
pose that the population has been divided intoi = 1,2, ..., L strata. Let mj; represent the 
size of the j-th cluster in the i-th stratum, n; the number of clusters selected in the i-th stratum, 
and yj, the multinomial response of the th element in the j-th cluster in the i-th stratum, 
= 1,2,...,my,j = 1,2,...,n;,1 = 1,2,...,L.Itis assumed that 1), the expected value 
of yiir, satisfies the logistic relationship (2.1) for a given explanatory vector X;;0. 

A consistent estimator of he say Bpszupo> can be found by maximizing the function 


otk bun: 
ABE y w;(log ae)’ Ye. (4.1) 


Algorithm (2.5) is performed with three indexes i, /, 2. The adjustment given by (2.13) and (2.14) 
is applied with 


IL, 
7 Ni, (4.2) 


i=1 


; é 
H,, (Gpseuvo) = , Wij A(x) @ Xijo Xijes (4.3) 
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dj; = 1D Wii(Vije — ize) @ Xijes (4.5) 
e=1 

Che = ni Boe ip (4.6) 

f; = sampling rate of i-th stratum, and (4.7) 


n* = 


ee 


Ee (4.8) 


The estimation procedure can be extended in a stepwise manner to multi-stage sampling 
designs by maximizing (4.1) up to elemental units. The summation of (4.3) should be extended 
in order to include all the final sampling units. The key part is (4.4). The construction of G 
must be based on the complex survey. This could be a difficult task for multi-stage sampling. 
Results for stratified two-stage sampling are presented in Fuller, et a/. (1986, p. 82). 
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5. SUMMARY 


In this paper, we have outlined a methodology for obtaining asymptotic normal estimators 
of the parameters of a generalized logistic function involving a multinomial response variable 
under complex survey designs. A consistent estimator of the asymptotic covariance matrix under 
the complex sampling design is (2.10), which results from the usual Taylor’s series expansion. 
This covariance matrix produces for large samples correct Type I errors for the F-tests involving 
model parameters. More important, it is shown that correction (2.13-2.14) provides a covariance 
matrix that reduces the small sample bias. This adjusted covariance matrix has some important 
characteristics: 


1. It levels off the inflated Type I error, originated from ignoring the complex survey, 
faster than the usual delta-method. 

2. It is positive definite when H,,(Bpsgupo) is positive definite regardless if (2.9) is 
singular or not. 

3. It is asymptotic equivalent to (2.10). 


The results of a Monte Carlo study were reported in Section 3. Data satisfying the logistic 
conditional mean (2.1) were generated under two different single-stage cluster sampling 
schemes. It was studied, among other things, the effect of the intra-class correlation and the 
design effect on the relative biases of the estimated Type I errors for the F-tests of Hp: 8 = B°. 
The simulation showed, as expected, a strong relative bias when the naive maximum likelihood 
method is employed. For small samples, the Monte Carlo results favor the use of the adjusted 
covariance matrix over the one that arises from the usual delta-method. 


ACKNOWLEDGEMENTS 


This work was begun while the author was a student at Iowa State University. I thank 
Professor Wayne A. Fuller for introducing me to the topic and for suggesting a number of 
the small sample modifications that were incorporated into the estimation procedure. The 
author wants also to thank the referees for useful comments. 


REFERENCES 


ALBERT, A., and LESAFFRE, E. (1986). Multiple group logistic discrimination. Computers and 
Mathematics with Applications, 12A, 209-224. 


BEDRICK, E.J. (1983). Adjusted chi-square tests for cross-classified tables of survey data. Biometrika, 
70, 591-595. 


BINDER, D.A. (1983). On the variance of asymptotically normal estimators from complex surveys. /nter- 
national Statistical Review, 51, 279-292. 


BINDER, D.A., GRATTON, M.A., HIDIROGLOU, M.A., KUMAR, S., and RAO, J.N.K. (1984). 
Analysis of categorical data from surveys with complex designs: some Canadian experiences. Survey 
Methodology, 10, 141-156. 


BULL, S.B., and PEDERSON, L.L. (1987). Variance for polychotomous logistic regression using 
complex survey data. Proceedings of the Section on Survey Research Methods, American Statistical 
Association. 


Survey Methodology, December 1989 223 


CHAMBLESS, L.E., and BOYLE, K.E. (1985). Maximum likelihood methods for complex sample data: 
Logistic regression and discrete proportional hazards models. Communications in Statistics, Theory 
and Methods, 14, 1377-1392. 


COX, D.R. (1970). The Analysis of Binary Data. London: Methuen. 


DALE, J.R. (1986). Asymptotic normality of goodness-of-fit statistics for sparse product multinomials. 
Journal of the Royal Statistical Society, Ser. B, 48, 48-59. 


FAY, R.E. (1985). A jackknife chi-squared test for complex samples. Journal of the American Statistical 
Association, 80, 148-157. 


FULLER, W.A., KENNEDY, W., SCHNELL, D., SULLIVAN, G., and PARK, H.J. (1986). PC 
CARP. Statistical Laboratory, Iowa State University, Ames, Iowa. 


GALLANT, A.R. (1987). Nonlinear Statistical Methods. New York: John Wiley & Sons. 
HABERMAN, S.J. (1974). The Analysis of Frequency Data. Chicago: The University of Chicago Press. 


HOLT, D., SCOTT, A.J., and EWINGS, P.D. (1980). Chi-squared tests with survey data. Journal of 
the Royal Statistical Society, Ser. A, 143, 303-320. 


JENNRICH, R.I., and MOORE, R.H. (1975). Maximum likelihood estimation by means of nonlinear 
least squares. Proceedings of the Section on Statistical Computing, American Statistical Association. 


KISH, L. (1965). Survey Sampling. New York: John Wiley & Sons. 


MOORE, D.S. (1977). Generalized inverses, Wald’s method, and the construction of chi-squared tests 
of fit. Journal of the American Statistical Association, 72, 131-137. 


MOREL, J. (1987). Multivariate nonlinear models for vectors of proportions: A generalized least squares 
approach. Unpublished Ph.D. dissertation. lowa State University, Ames, Iowa. 


NELDER, J.A., and WEDDERBURN, R.W.M. (1972). Generalized linear models. Journal of the Royal 
Statistical Society, Ser. A, 135, 370-384. 


RAO, J.N.K., and SCOTT, A.J. (1981). The analysis of categorical data from complex sample surveys: 
Chi-squared tests for goodness-of-fit and independence in two-way tables. Journal of the American 
Statistical Association, 76, 221-230. 


RAO, J.N.K., and SCOTT, A.J. (1984). On chi-squared tests for multiway contingency tables with cell 
proportions estimated from survey data. The Annals of Statistics, 12, 46-60. 


ROBERTS, G., RAO, J.N.K., and KUMAR, S. (1987). Logistic regression analysis of sample survey 
data. Biometrika, 74, 1-12. 


SCHNELL, D., KENNEDY, W.J., SULLIVAN, G., PARK, H.J., and FULLER, W.A. (1988). 
Personal computer variance software for complex surveys. Survey Methodology, 14, 59-69. 


iy 


Wee ee ae 


ae Eaoggttaiiie 
oT = eee Vs aut Re 4 

reaps Nac tees ame 

aap. te ts chevaes an creteseee Eee 


haba vszay b ms Broo at be uae 
Ms a raion to 10, ‘sear un ws aa 


The simulating chewed, 
aides test regret” | 


$eigciarec mairiomnleteneiinaen enn anaes ponqqa’, 

WATT st Yo leona. .2hoisom ansail basilersasc> (ETO). MW. MAUAAAICSW ba 

; | PBEDTE REL A 392: ey 

‘nevis siqmns xolymos mort sia eeeasapay deerme HG) LA 17032 bas 00 OA 

fase silt (6 Varwol. 2olded ysw-tert 0 

sobeaeigen: ek wad dDogun white the ear wes i) ic hpergp Blows se "Barve 
Ps | : 


soma) car ole Mi : at} ' he> 2noiFIc 
iormeaslamnns \nalriioesaciraredinien tel Glare BeMibkhas 1A. \ ‘OAS. D. 


Sit at acdsee asa 
(2821) AW ABLIT bor LH AKAD 0 .WAVLEIUG ,...W (Osa VMae Ld es ‘ 


VO-O2 D1 canloborlisih ‘gous vorwe xalqnios 107 sawfios snetey wsisqmes laaeewt > 
hil’ CHENCLS i 


Liwrt, A., <a LESAIPRE, &. (7235). Muldple 


> logiethe dieommeiration. Compe ar 
tia Qs Aygiieeiian ere a 224 


my 


: 7 ru 
=e v1 LPR Adiveeed chl-querc test fi | sled wines of survey Gata. Sicerumtrihyy, 
| 
PN f ) Ertievarrgac }! esytap Ay tie anhAion rah rp er Anes, Eufee \ 
© “oe al } v 1 
‘ 4 ? 1 wi -»* . , oes BADD a= « i | oie). 
gn * a dirty ine Cena@en ticcti¢ee Sree 
iT ’ 
; ~. | < Versi @ Ge Pui/cAcheoelets MRA Pegreeiiy ee 
ine ‘i — f a ity ays 


14 we Swe Keer 4 ink es, 4ueere eee 


Survey Methodology, December 1989 225 
Vol. 15, No. 2, pp. 225-235 
Statistics Canada 


Randomized Response Sampling from 
Dichotomous Populations with 
Continuous Randomization 


LeROY A. FRANKLIN! 


ABSTRACT 


A randomized response model for sampling from dichotomous populations is developed in this paper. 
The model permits the use of continuous randomization and multiple trials per respondent. The special 
case of randomization with normal distributions is considered, and a computer simulation of such a 
sampling procedure is presented as an initial exploration into the effects such a scheme has on the amount 
of information in the sample. A portable electronic device is discussed which would implement the 
presented model. The results of a study taken, using the electronic randomizing device, is presented. 
The results show that randomized response sampling is a superior technique to direct questioning for 
at least some sensitive questions. 


KEY WORDS: Randomized response; Randomization with continuous distributions; Computer 
simulation. 


1. INTRODUCTION 


Surveys often seek to estimate the proportion of individuals satisfying a particular condi- 
tion. If the condition involves a highly personal or controversial subject (e.g., seeking new 
employment, sexual behavior) or of an illegal nature (e.g. drug usage, criminal activities), survey 
respondents may be reluctant to answer honestly or may refuse to answer a direct question 
as to whether they satisfy the condition of interest. In such cases, it is difficult to make inferences 
about proportions on the basis of a survey in which sensitive questions are asked directly. 

Randomized response sampling plans utilize a stochastic or randomizing device to enable 
respondents to provide answers to sensitive questions without fully revealing information 
regarding the sensitive issue. The actual outcome of the device for a particular respondent is 
observed by the respondent but not by the interviewer. However, the properties of the device 
are known to the experimenter, and this enables the experimenter to make inferences about 
the proportion of interest without knowing specifically about any single individual. The 
stochastic device introduces noise into the information-gathering process, but the resulting loss 
of information may be preferable to the uncontrollable noise introduced by nonresponse or 
lying when direct questions are used. 

The original randomized response model was proposed by Warner (1965) and involved a 
dichotomous randomization for a dichotomous population. His model was studied from a 
Bayesian viewpoint in Winkler and Franklin (1979). The randomized response model with two 
or more trials per respondent was introduced by Gould, Shah and Abernathy (1969) and fur- 
ther developed by Liu and Chow (1976). Both papers demonstrated the superiority of the 
multiple trials per respondent in improving the efficiency of the estimate over the single trial 
model of Warner’s. However, both also note that multiple trials might produce simultaneously 


! Dr. LeRoy A. Franklin, Department of System and Decision Sciences, Indiana State University, School of Business, 
Terre Haute, Indiana 47809. 


226 Franklin: Randomized Response Sampling 


growing suspicion and lowered ‘‘truth telling’’ over the single trial model. The survey paper 
prepared by Horvitz, Greenberg, and Abernathy (1976) discusses several other plans with 
discrete randomization devices. In addition a thorough theoretical development and review 
of results is contained in the recent volume by Chaudhuri and Mukerjee (1988) entitled ‘‘Ran- 
domized Response: Theory and Techniques.’’ A more general model, using either discrete or 
continuous randomization, is presented in Warner (1971) and these more general models were 
discussed from a Bayesian viewpoint by Pitz (1980), Smouse (1984), and O’ Hagen (1987). A 
few surveys have actually been undertaken, some showing the randomized response methods 
are superior to direct survey methods (e.g. Gould et a/. 1969 and Liu and Chow 1976) and a 
few others of uncertain results (e.g. Brewer 1981). However, only Poole (1974) developed a 
specific continuous randomization distribution (uniform) to estimate a continuous distribu- 
tion and this was implemented by having respondents report their answer multiplied by a 
number chosen randomly from a random number table. 

In this paper, we consider a randomized response model for sampling from a dichotomous 
population, but using a continuous randomization distribution. With Warner’s original ran- 
domized response technique, the randomizing device determines which question the respon- 
dent answers. But with the method developed in this paper, the question for a respondent is 
fixed by whether or not he belongs to the sensitive group. The randomization here chooses 
values from two distributions (one for ‘‘yes’’ and the other for ‘‘no’’) and the respondent 
provides the value appropriate to his group membership. Multiple trials are incorporated into 
the model by having the respondent provide a single multi-digit response. This provides a 
potential benefit over usual multiple trial techniques in that the respondent perceives he/she 
has provided just one answer when in fact the multi-digit response incorporates several trials 
of the respondent. 

The general model, for which the randomization can be handled via any type of distribu- 
tion, is presented in Section 2. The special case in which the randomization involves normal 
distributions is discussed in Section 3, along with an approximating procedure for assessing 
the effect of randomization and multiple trials per respondent. Section 4 presents a computer 
simulation investigating the role that specific choices of means and standard deviations play 
in the efficiency of surveying by using normal distribution randomization with multiple trials. 
Section 5 presents a way of implementing normal distributions as the randomizing distribu- 
tion through the use of a computerized, electronic device that generates and displays random 
normal values. Such a device was felt to be potentially superior to ‘‘drawing cards’’ or ‘‘flip- 
ping a spinner’”’ since these methods may not be properly implemented by the respondent or 
the interviewer. The results of a survey taken using that electronic device to investigate five 
sensitive questions are examined in Section 6. Finally, a summary and a brief discussion of 
design issues are considered in Section 7. 


2. THE MODEL 


Suppose that we are interested in 6, the proportion of individuals belonging to Group A 
among the members of a particular population. A simple random sample of n individuals is 
chosen from the population withn = 1, where we assume that the population is large enough 
relative to n so that the sampling process can be viewed effectively as sampling with replace- 
ment. A total of k trials are conducted with each respondent, where k => 1. On trial / for respon- 
dent 7, random values are drawn from the distribution functions Gj; and H;;. The respondent 
sees both values and is asked to report the value from G;; if he or she belongs to Group A and 
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the value from Hj; otherwise. The researcher knows the exact form of G;; and H;; but sees only 
the value reported by the respondent, denoted by z;;, and, thus, does not know from which 
distribution it came. 

Inferences must be made about 6 based on the kn sample observations z,;, withi = 1, ..., 
nandj = 1, ...,k. For convenience, we assume in the remainder of this paper that G;; and 
H; are absolutely continuous with corresponding densities g;, and h,;; the development for the 
discrete case is analogous. The conditional density function of z;; given 6 is @ gj; (zi) + 
(1 — @) hj (Z;), and the likelihood function for the entire experiment is: 


n k k 
r(z\ ey = II Ga i; (zi) + (A - 9) II hi; (i) | for0 < 6 <1, (2.1) 


i=] j=l j=l 


WHERE = (2), 05s Sp) ANG Ze = (Zits. oxo» Zin) 
Expanding the likelihood function using the binomial theorem allows the likelihood func- 
tion to be written in the form 


n 
L(z| 6) = Y) a, 6'(1 — 6)"~‘ where 0 < 6 < Land (2.2) 
t=0 


c k k 
a=) | TI I] eu (| | TI I] * (4) | wit (2.3) 


s=1 Li€Cy j=1 i€Cys j=] 


Cy, -.-, Cy representing the c = (7) combinations of ¢ items out of n. Here 6‘(1 — 6)"~‘ 
is the Bernoulli likelihood conditional upon exactly ¢ respondents being in Group A, and a, 
is the likelihood of z given ¢. The mixture form in 2.2 arises because we are unable to observe 
a specific ¢ in our sample. 

A special case of (2.1) arises when we assume that the same randomizing distributions are 
used for all m respondents. Thus, g;, = g;andh, = h;fori = 1... mand thus (2.1) reduces to 


n k k 
Lee = II TI g; (z;)) + (1 — 8) lid § h; (| for 0%sr0) = 1-09(2:4) 


i=1 j=l J=1 


Whichever the form, in order to find the maximum likelihood estimates, a direct computer 
grid search must be made. This is feasible since 0 is only a one-dimensional quantity and is 
restricted to the interval from 0 to 1. This can be easily accomplished by using well-known search 
techniques applied to the log of the likelihood function. (See, for example, Kennedy and Gentle 
1980). 


3. RANDOMIZATION WITH NORMAL DISTRIBUTIONS 


Although any continuous distribution (e.g. Weibull, uniform, efc.) can be used as the ran- 
domizing distribution in the model discussed in Section 2, in this section only the normal dis- 
tribution will be examined. Furthermore, suppose that the same randomization distributions 
are used for all respondents, so that form (2.4) is the appropriate likelihood. Thus, g; and h; 
are normal densities with means p,; and y,; and standard deviations o,; and o;,;, respectively. 
Then the likelihood function in Section 2 can be related to these normal densities. 
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The amount of information that can be obtained about @ obviously depends on the means 
and standard deviations that are chosen. At one extreme, if w,; = pj and o,; = op; for 
j = 1, ...,k, then @ drops out of the likelihood function and z (the sample) will provide no 
information about 6. At the other extreme, if | w,; — wy; | ~% for any / with o,; and o,; fixed 
or if o,; + Oand o,; — 0 for any / with a fixed | Hg — Haj | # 0, then we are effectively able 
to detetntine which group each respondent belongs to and the sampling process thus approaches 
Bernoulli sampling in 0. 

An approximation to L (z| @) as developed by Winkler and Franklin (1979) makes it easier 
to assess the effect of randomization and multiple trials with the choice of specific means and 
standard deviations. That is, for each sample, we can approximate the actual likelihood func- 
tion given by (2.4) with an approximate likelihood function of the form 


L*(r*, n*|0) = 0 (1 — Ay". (3.1) 


Taking the first and second derivations of the log of the approximating likelihood (3.1) and 
solving to find the maximum (@) and the curvature at that maximum yields: 


ye ipo 
adie (3.2) 
2] *(7* n* * 
<6 pl eta Calla shepeteatiancn, (3.3) 
a6 A ANE “Oo ea) 


Next taking the first derivative of the log of the exact likelihood (2.4) and setting it to equal 
zero gives the equation that will yield the exact maximum likelihood estimate for 6: 


n k k 
Lo oy ae are = 0 where y; = II 8; (Zi), 11 = II hj (Zi) (3.4) 


A grid search produces for (3.4) its solution (6,). Taking the second derivative of the log of 
the exact likelihood (2.4) yields: 


a7 log L (z| 6) 


a nil? 
= - yy (3.5) 


ae ae 5G = 8)nil? 


ri 


Substituting 6, into (3.5) gives the curvature of the actual log likelihood at 6, (the Seer 

Equations (3.2) and (3.3) are two equations in two unknowns, 7* and n*. Setting (3.2) = 

and (3.3) = (3.5) allows us to solve for r* and n* so that the approximating log sare 
has the same maximum 6 = 6,, and curvature at that maximum as does the actual log 
likelihood. Thus, the randomized response sample outcome of z can be thought of as approx- 
imately equivalent to a non-randomized response sample (i.e. regular Bernoulli sampling) with 
r* members out of n* in the sensitive group. In this sense, n* can be thought of as a rough 
measure of the amount of information in the randomized response sample which is of size n. 
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4. A COMPUTER SIMULATED INVESTIGATION 
OF THE CHOICE OF MEANS AND 
STANDARD DEVIATIONS 


To investigate the impact of a given set of means and standard deviations for the normal 
randomizing distributions as well as the impact the size of 6 and k (the number of trials) has 
upon r* and n* the randomized response sampling process was simulated by generating, via 
computer, repeated samples from a Bernoulli process with parameter 6 and k sets of two-digit 
responses for each sample. In our simulation, we let p,; = 50, uz; = 40, and o,; = on; = 0 
forj = 1, ...,k. Weconsidered two values of 6 (.10 and .25), two values of o (6 and 9), three 
values of n (50, 200, and 500), and three values of k (1, 2, and 3). Such values were chosen 
since they will register two-digit deviates that would overlap in distribution considerably and 
provided then a bench mark for later choices in the actual survey environment. For each of 
the 36 combinations of parameters, we replicated the sampling procedure 25 times. The solu- 
tions of r* and n* were found numerically for each sample, and the average values of n* for 
the 25 replications with each set of parameter values are given in Table 1. 

The average values of n* vary considerably. At the worst extreme, when o = 9, 6 = .10, 
and only one trial per respondent is used, n* tends to be only 10-15 percent of n. On the other 
hand, when o = 6,0 = .25, and three trials are used per respondent, n* is about 75 percent 
of n. As expected, the average value of n* (the effective sample size) increases as 1 (the number 
of respondents) increases or as k (the number of trials per respondent) increases. In addition, 
decreasing o or increasing 6 also leads to a higher n*. 

For each combination of parameters, the mean and variance of 6 over the 25 trials were 
determined. The average values of 6 are very close (within 5%) to the corresponding values 
of 0, and the variance of 6 tends to increase as the average n* decreases and, hence, tends to 
validate the simulation. 


Table 1 


Average Values of the Effective Sample Size (n*) for Various Sample Sizes () and the 
Number of Trials per Respondent (k) 


Ga—an10 OS 2 
n k = 6 C1=n9 o0=6 n= 9 
1 16.2 17.3 9.2 
50 yD, Dies 30.6 17.8 
3 32.6 38.2 
1 58.3 24.8 79.0 41.2 
ALVD) 103.1 49.6 124.4 72.9 
3 136.6 Wasi) 151.0 97.7 
1 148.4 59.6 196.9 103.6 
500 meer, INS Al 129.3 309.5 181.2 
3 345.8 193.1 375.6 Dav 
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5. A PORTABLE, COMPUTERIZED RANDOMIZING DEVICE 


Randomized-response sampling, using randomization with normal distributions and multiple 
trials, provides flexibility to the experimenter, who can select means and variances as well as 
the number of respondents and the number of trials per respondent. However, this flexibility 
is not of any value, unless the sampling scheme actually can be implemented in practice. The 
sampling scheme utilizing Bernoulli randomization can be implemented in a number of ways 
(e.g., with cards or colored beads). However, the scheme developed in this paper requires 
generation of random normal values by some portable device. 

A computerized, electronic device was built around the Intel 8080 microprocessor to generate 
and display random normal values. Each value is obtained by summing 16 uniformly distributed 
random numbers and transforming that sum to achieve a normal deviate with the desired mean 
and standard deviation. From the Central Limit Theorem, the resulting values should be 
approximately normally distributed, and extensive tests indicate that the values produced by 
the device do indeed behave like random normal values. This technique was chosen over other 
possible methods of generating normal deviates due to the simplicity of programming such 
a method in machine instructions for this specific microprocessor. For more details concer- 
ning the generation of the random normal values and the testing of the device, see Franklin 
(1977), Kennedy and Gentle (1980), as well as Knuth (1969). 

The final, resulting device was approximately the size of a cigar box and is easily held in 
the hand. Power can be supplied either by a battery pack or by an extension cord. 

For display purposes, the random normal values are truncated to two digits, and the device 
is designed to display six such two-digit numbers simultaneously in ‘‘windows”’ of six digits 
each. One window displays values chosen from g, g2, and g3 which appears as a single six- 
digit number in the ‘‘Yes’’ window. The other window displays values chosen from hy, hz, and 
h3 which also appears as a single six-digit number for ‘‘No’’. The six means and standard 
deviations are stored permanently in the device, but they can be changed easily by using a small, 
detachable keyboard. 

The actual surveying process is accomplished in the following manner. First, the interviewer 
asks the respondent a sensitive question about Group A. The respondent then pushes a button 
to activate the device, and two six-digit numbers appear in the windows within about one quarter 
of asecond. If the respondent is a member of Group A, the number in the first window (the 
‘*Yes’’ window) is reported; otherwise, the number in the second window (the ‘‘No’’ window) 
is reported. To convince the respondent of the ‘‘randomness’’ of the values, he or she is 
encouraged to press the button several times and to observe the resulting numbers before the 
sensitive question is actually asked. Note that although k = 3, the respondent perceives a 
response as a single six-digit number, and we are thus actually obtaining three trials with a single 
six digit response. Hence, the advantage of multiple trials per respondent is exploited without 
the usual accompanying disadvantages coming into play. 


6. SURVEY RESULTS AND CONCLUSIONS 


Two simultaneous, but independent, surveys were conducted on the campus of a large urban 
university of students enrolled in that university. The first asked five sensitive questions of a 
respondent by the direct question method. The second asked the same five sensitive questions 
of a different respondent but using Randomized Response Sampling with continuous ran- 
domization implemented by the electronic device presented in the previous section. For the 
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study kK = 3 and py, = bg, =Hg, = 40 and wy, = May = Hn, =50 with Og, = On = 5 for 
J = 1, 2,3. These values were chosen in accordance to the finding of the computer simula- 
tion discussed in Section 4. A different group of students was systematically selected (one in 
five) for each of the two surveys from students on the campus and individually interviewed. 
Each student surveyed was given a brief introduction as to the purpose of the survey and asked 
if they wished to participate. Less than 10% of all individuals stopped by both survey teams 
declined to participate. If the individual was willing to participate, he/she was then asked to 
provide his/her social security number to verify that he/she was, indeed, enrolled in the univer- 
sity. All respondents of both surveys had their social security number checked against an 
administrative master list of students and those not recorded as enrolled students were 
eliminated from the study (less then 5 percent of those surveyed). 

Requiring their social security number also deliberately injected the element of associating 
the individual’s identity with his responses. For many surveys (i.e. telephone, mail-in ques- 
tionnaires, house-to-house surveys, efc.), this is the case and plays a significant role in the will- 
ingness of a respondent to answer truthfully. It was felt that it was precisely in such ‘‘revealing”’ 
circumstances that randomized response sampling can benefit the researcher most. The resulting 
sample sizes for the direct and randomized response methods were n, = 473 and n, = 477. 
The five sensitive questions were: 


Q1 — ‘‘Have you ever cheated on an exam here at this university?”’ 
Q2 — ‘‘Would you ever cheat on your income tax?’’ 

Q3 — ‘‘Would you ever steal from an employer?’’ 

Q4 — ‘‘Have you smoked any marijuana in the last 30 days?’’ 

Q5 — ‘‘Have you ever participated in a homosexual act?’’ 


All five questions were felt to be sufficiently sensitive so that any gains by randomized 
response sampling over direct sampling could be easily apparent. In addition, as a final ques- 
tion, the respondents in the randomized response group were asked ‘‘Do you think your friends 
would be more willing to tell the truth if they were asked sensitive questions by this technique?”’ 
This was asked in an effort to measure the acceptance and confidence of the person being inter- 
viewed that this particular randomized response technique did provide personal protection and 
anonymity. 

The estimates of the proportion of respondents who are in the sensitive group are presented 
in Table 2 for both direct (6:2) and randomized response (6;-) for question i along with the 
estimate of n¥ (the effective sample size) for the randomized response method using the 
method discussed in Section 3. Also is presented the z value of a one-sided test of hypothesis 
H,: 9ig — 9ir = Ovs Ay: Vig — 9ir < 0, along with the observed p-values. The tests were con- 
ducted using n, and n* as sample sizes and hence give a much more conservative result than 
if n,; and n, were utilized. 

It is noteworthy that the randomized response method gave a higher estimate of 6 for each 
of the five sensitive questions than the direct survey method. Furthermore, for Questions 1, 
2, and 5, the randomization response method gave statistically significantly higher estimates 
of 6 (p-values < .001 for all three) than the direct survey method. Hence, there seems to be 
conclusive evidence that, at least for some sensitive issues, the randomized response method 
with continuous randomization does provide better estimates of population proportions. It 
should also be noted that by our choices of Mei Mj» Fe; and On; and k = 3 that n# typically was 
75 to 85 percent of the original sample size n, and thus most of the information was 
‘‘recovered’’ by our randomized response method. 
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Table 2 


Estimates of @ and Results of Testing Equality of 6’s for Direct and Randomized 
Response Sampling with Respective Sample Sizes of m) = 473 and n) = 477 


TE 


Question Effective sample size 
i bia 6, n* z-value p-value 

Pe nerees Peek ee ed eee ee, a ee See 
1 .0634 .2013 394.5 6.098 < .0001 
2 .1797 2941 408.1 3.997 < .0001 
3 1078 1207 384.8 583 .2810 
4 .1882 .1942 409.5 234 .4091 
5 .0042 .0355 339.0 3.341 .0004 


Furthermore, it is instructive to consider the nonsignificant results for Questions 3 and 4. 
This information (if the three significant results are ignored) could lead an observer to con- 
clude that randomized response techniques are not particularly advantageous over direct ques- 
tioning. However, in the light of the three significant differences revealed, this lack of 
significance perhaps could be interpreted as the question really was not “‘sensitive enough’’ 
to lead to dramatic differences in 6’s or even that the question was ‘‘so sensitive’’ that the respon- 
dent chose to lie even with the randomized response technique. In addition, Question 1 ‘“Have 
you ever cheated on an exam?”’ seemed to the experimenter to be relatively ‘‘unsensitive’’ but 
in retrospect the answer to this question when tied to the social security number of the respon- 
dent (given before the questioning process started) presented a much more threatening cir- 
cumstance than was initially realized. Thus, perhaps some of the confusion about the efficacy 
of the randomized response technique is related to the ‘‘true sensitivity’’ of the question for 
the interviewee as opposed to the ‘‘perceived sensitivity’ by the interviewer or experimenter. 
These aspects need further examination. 

Finally, 88.9% (424 of the 477) felt ‘‘their friends would be more likely to answer truthfully 
sensitive questions by this randomized response technique.’’ While some reservations may be 
expressed by the respondents’ ‘‘desire to please the interviewer,’’ nevertheless, this over- 
whelming percentage coupled with the significant differences already discussed seem strong 
evidence that this technique was accepted and felt to be protective of the interviewee. 


7. DISCUSSION 


The model developed in this paper permits the use of continuous, as well as discrete, ran- 
domizing distributions in utilizing randomized response sampling from a dichotomous popula- 
tion. In order to implement the model using randomization with normal distributions, a 
computerized, electronic device was also developed and discussed. The device is portable, has 
programmable means and standard deviations for the six normal distributions and provides 
from a single six digit response, three separate two digit trials. Such a system has both poten- 
tial advantages and disadvantages over other randomized response techniques. 

First, as alluded to in the introduction, a computerized randomizing device could be superior 
to the standard randomized response methods of ‘‘drawing cards’’ or ‘‘flipping a spinner’”’ 
since these methods may not be properly implemented by either the respondent or the inter- 
viewer which would induce uncontrolled error. (See Abernathy, Greenberg and Horvitz (1970) 
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for a discussion of the problems of ‘‘insufficient card shuffling’’ and ‘‘card loss’’ as well as 
insufficient interviewer training). Since the production of the randomizing values is com- 
puterized, the distributional problems that can and have accompanied the use of cards, beads, 
and spinners are eliminated because the problem of ‘‘random selection of values’’ is taken out 
of the hands of the interviewer and respondent and placed in the ‘‘hands’’ of the computer. 
If the computerized device fails, it is usually a complete, catastrophic crash of the whole chip 
which is readily apparent and very, very rare. 

The second (and perhaps greatest) advantage is in the ability of the device to present a choice 
of two numbers each six digits in length from which the respondent chooses to answer ‘‘yes”’ 
or “‘no’’. But what seems to the respondent as a single six digit answer is in fact three separate 
two digit answers and in effect provides three trials per respondent. Thus, the benefits of 
multiple trials per respondent are gained but, since the respondent is unaware of the multiple 
trials format, without the usual accompanying disadvantages (noted by Liu and Chow 1976) 
coming into play. 

In addition, the freedom to choose the six means and six standard deviations provides 
the experimenter with additional flexibility over standard randomized response techniques. 
For instance, if it is felt that the differences in the first two digits are most noticeable to 
respondents, the experimenter can make pj, and o;, close to (or even equal to) Mg, and og,, 
respectively. Similarly, if the middle two digits might receive the least attention, the experimenter 
could attempt to gain the most information from these values by separating Un, and p,, the 
furthest. It is also possible to wire the displays in other than the obvious manner. For instance, 
the two digits of the first random normal value could appear as the fifth and second digits of 
the six digit number instead of the first and second digits. This flexibility in wiring, together 
with the the choices of parameters should provide a sampling scheme that is quite informative 
to the researcher without seemingly to threaten the respondent. 

It should also be noted that while for this particular microprocessor it was convenient to 
utilize randomization with normal distributions, several other continuous distributions (e.g. 
uniform, Weibull) or even multi-valued discrete distributions (e.g. multinomial or poisson) 
could have been used. Further investigation into newer microprocessors as well as different 
randomizing distributions is recommended. 

There are, however, some potential disadvantages associated with this particular randomized 
response technique. The cost of such a randomizing device since it involves a microprocessor 
is the order of fifteen hundred to two thousand dollars to produce. However, its versatility 
in wiring and programming would hopefully allow a device to be used in many investigations 
over several years and thus help to defray its rather high cost. 

More difficult to quantify is the respondent’s perception of the computerized device and 
the degree of confidence or suspicion he/she might have about the device. Do respondents fear 
that the computerized device is somehow ‘‘storing’’ their answer that somehow later can be 
deciphered to expose them? From the survey results, it seems that greater truth telling was 
secured by using the computerized randomizing devices over the direct survey method. Never- 
theless, further study is recommended to compare this randomized response technique which 
uses the computerized device with other more standard randomized response techniques. 

In practice, several matters are relevant in the consideration of design issues (i.e., the selec- 
tion of means and standard deviations for the device). In order to gain more information for 
a given sample size, we should increase | He; — Khy | and decrease Og; and Cn, for ii=-lyc2; 3. 
However, as this is done, it will become clearer to the respondent that, despite the randomiza- 
tion, the response is very revealing concerning the respondent’s group membership. As a result, 
the respondent may not answer honestly or may refuse to answer. Additional study is needed 
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to determine optimal values for choice of means and standard deviations. The results in Table 
1 give some indication of the effects of varying a common standard deviation. But from a prac- 
tical viewpoint, the field survey seemed to indicate that the choice of means separated by two 
standard deviations was able to both gain the confidence of the respondent and (with the 
multiple trials) to gain back from 75 to 85 percent of the original sample size without the usual 
‘loss of confidence’’ that accompanies multiple trial techniques. 

In particular, the field trial compared the direct survey techniques with the randomized 
response using the electronic device discussed with Ln; = 40 and p,, = 50 and Oh, = Og, = 5 
forj = 1,2,3 forthe normal, randomizing distributions. Of the five sensitive questions which 
were asked of the two (independent) groups, the randomized response method provided 
significantly greater estimates (p < .001) than the direct method for three of the questions. 
In addition, 88.9% of the subjects interviewed by the randomized response technique felt ‘‘their 
friends would be more likely to tell the truth if they were asked sensitive questions by this 
technique’. Thus, it seems that (for at least certain questions), this randomized response 
sampling technique achieved greater honesty in response than the direct sampling method. 

The question of protection of the respondent’s privacy needs to be discussed. It is not ethical 
to tell the respondent that his or her group membership is disguised by the randomization, if, 
in fact, the disguise is transparent to the researcher (e.g. for example, by recording only even 
numbers for ‘‘YES”’ and only odd numbers for ‘‘NO’’). With the electronic device that has 
been discussed, it seems indeed possible to provide true privacy without losing much informa- 
tion. If the means and standard deviations are programmed into the device and are not pro- 
vided to an interviewer, the interviewer will find it very difficult to discriminate between group 
members and non-group members in the interviewing process, particularly if the wiring is 
‘“scrambled’’. Thus, the flexibility that enables us to gain information without threatening the 
respondent also helps to disguise the actual group membership from the interviewer. 
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Small Area Estimates of Proportions Via 
Empirical Bayes Techniques 
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ABSTRACT 


Empirical Bayes techniques are applied to the problem of ‘‘small area’’ estimation of proportions. Such 
methods have been previously used to advantage in a variety of situations, as described, for example, 
by Morris (1983). The basic idea here consists of incorporating random effects and nested random effects 
into models which reflect the complex structure of a multi-stage sample design, as was originally pro- 
posed by Dempster and Tomberlin (1980). Estimates of proportions can be obtained, together with 
associated estimates of uncertainty. These techniques are applied to simulated data in a Monte Carlo 
study which compares several available techniques for small area estimation. 
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1. INTRODUCTION 


1.1 The Problem 


Complex multi-stage surveys are used to obtain estimates of proportions in many research 
disciplines (e.g., epidemiology, economics, criminology efc.). Not only are estimates for local 
areas and other special subgroups required, but there is also a need for reliable measures of 
the accuracy of these estimates. This suggests to us the need for improved methodologies for 
this estimation problem and related statistical inference. 

In addition, the techniques based on the standard normal theory used by Fay and Herriot 
(1979) to estimate income, a continuous random variable, in small areas are no longer directly 
applicable to the problem of estimating proportions for discrete outcome variables. Here, it 
is the logit transform of the proportion, not the proportion itself, that will be modelled in a 
linear way. This creates the same problems of estimation as in classical statistical logistic regres- 
sion theory. (See Haberman 1978.) Unfortunately, fewer attempts have been made to solve 
this obviously more complex problem in small area estimation. 

In order to address the problem of inference from a relatively thinly spread complex, multi- 
stage survey to small areas or domains not necessarily included in the survey, we have chosen 
an explicitly model-based approach. This was proposed originally by Dempster and Tomberlin 
(1980) for the estimation of census undercount from a post-enumeration survey. The meth- 
odology uses both a random effects, multiple logistic regression model and empirical Bayes 
techniques. This directly yields estimates of uncertainty associated with the estimated propor- 
tions for small areas via a Bayesian paradigm. This explicitly model-based method differs 
substantially from the implicitly model-based approach of the synthetic estimation techniques 
of Gonzalez and Hoza (1976, 1978), Gonzalez and Waksberg (1975), and others. 
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As a typical complex survey will often be a nested structure of primary sampling units 
(PSU’s), secondary sampling units (SSU’s) within PSU’s, tertiary sampling units (TSU’s) within 
SSU’s and, finally, households within TSU’s; the explicitly model-based approach will allow 
us to take into account the complexitly of the sample design. The purpose of introducing a 
random effects model is to allow the data to determine, by empirical Bayes techniques, an 
appropriate compromise between the classical unbiased estimates which depend only on data 
in the specific local area, and the fixed effects estimates which pool information across areas. 

In Section 1.2, a literature review is given and a solution to the problem of estimating pro- 
portions for small areas is proposed. The model and its associated estimates are made explicit 
in Sections 2 and 3 respectively. The results are applied to simulated data in a Monte Carlo 
study presented in Section 4. 


1.2 The Review and a Proposed Solution to the Problem 


Because of the growing need for small area statistics in recent years, and because reliable 
estimates for small areas or subdomains are not usually directly available by classical sample 
survey methods, several researchers have focused on the problem of small area estimation. 
This has necessitated the use of explicitly or implicitly model-based methods which allow for 
‘“‘borrowing strength’’ across small areas in order to increase the effective sample size for estima- 
tion, and hence the accuracy of the resulting estimates. Although much of the research in this 
area has applied linear model techniques and concentrated on the estimation of means or totals, 
rather than proportions, a discussion of the literature on these estimators and the criteria used 
to evaluate them can add valuable insight into our problem. 

Classical theory dictates that estimators should be design-consistent and, if possible essen- 
tially design-unbiased. However these estimators are not always particularly useful when the 
sample sizes are small. 

Gonzalez (1973) described the method of synthetic estimation as follows: ‘An unbiased 
estimate is obtained from a sample survey for a large area; when this estimate is used to derive 
estimates for sub-areas on the assumption that the small areas have the same characteristics 
as the larger area, we identify these estimates as synthetic estimates.”’ It seems its first reported 
use was by the U.S. National Center for Health Statistics (1968) for the calculation of state 
estimates of long and short term disability rates. Various authors subsequently tried to formalize 
this concept of synthetic estimation, in particular, for means of continuous outcome variables, 
using both ad hoc and model-based approaches. Gonzalez (1973), Gonzalez and Waksberg 
(1975), Gonzalez and Hoza (1976) and Levy and French (1978) used previous census data to 
form post-strata which are subsequently used to combine information across small areas under 
the assumption that the mean response is similar across a section of these areas. Levy (1971), 
Ericksen (1973, 1974) and O’Hare (1976) employed regression methods in order to incorporate 
auxiliary information in small area estimation. The accuracy of this method has been evaluated 
in terms of its average sampling mean squared error over all small areas in a region. 

Ericksen (1974) warned that there is no systematic methodology for the assessment of the 
bias or accuracy of synthetic estimators. Despite these shortcomings, synthetic estimation still 
remains a potentially powerful and attractive tool. There have been many reported empirical 
evaluations both on actual and simulated data sets of synthetic estimation in recent years, 
including Levy (1971), Gonzalez (1973), Gonzalez and Hoza (1978), and Schaible (1979). Several 
of these types of studies are described in a volume edited by Platek and Singh (1986). 

Royall (1970, 1973), using a model-based approach, also considered the problem of 
estimating totals in finite populations, when auxiliary information is available. He established 
a probability model of the relationship between the variable of interest and the auxiliary variable 
and then derived optimal subdomain predictors. 
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Holt, Smith and Tomberlin (1979) and Laake (1979) applied the predictive approach of 
Royall to the problem of small area estimation. Laake (1979) found that in contrast to the syn- 
thetic approach, where biased estimators are usually obtained without an explicit method of 
estimating the bias, the prediction approach yielded estimates of mean squared error (MSE) 
as a tool for the comparison of estimators. In the problem of estimating small area totals, Holt, 
Smith and Tomberlin (1979) specified various possibilities of population structure in order to 
model the assumed relationship across subareas. With a specified model, it becomes possible 
to determine whether or not it is supported by the data and also to study the effect of model 
misspecification on the bias of the observed estimators. Under different models, the variance 
of the estimator, the estimate of the variance and MSE change. They built model-based con- 
fidence intervals, which have interpretations in terms of repeated realizations under the super- 
population model. 

Purcell and Kish (1979, 1980) reviewed the different existing techniques of small area estima- 
tion, subdividing them into the following broad categories, regression-based procedures, the 
use of empirical Bayes and of Bayesian methods, superpopulation prediction theory, clustering 
techniques, and categorical data analysis methods. They underlined the fact that small area 
domain estimation should not be considered as a homogeneous problem, but that there exist 
many other interacting factors such as domain size which should be taken into account when 
choosing the type of estimator. Sarndal (1984) later confirmed this. 


The most serious shortcoming of model-dependent estimators is that useful estimates of 
mean squared errors are not available using fixed effects models because associated variance 
estimates do not reflect the bias inherent in estimates based on models having a reduced set of 
parameters. Two different approaches were then taken to the problem of small area estimation. 


Fay and Herriot (1979) used the James-Stein theory of estimation (James and Stein 1961) 
on sample data to determine estimates of income for small places from the 1970 US Census 
of Population and Housing. In fact, they used an empirical Bayes approach which originated 
with Robbins (1955) and has been described by Efron and Morris (1975), thus formalizing the 
meritorious suggestion of Madow and Hansen (1975) of forming a weighted average of the 
sample and regression estimates. A similar approach by Schaible, ef. al. (1977) gives a method 
for arriving at a composite estimator which is the weighted average of the unbiased and syn- 
thetic estimators. For other examples of empirical Bayes methods for small area estimation 
based on standard normal theory see Stroud (1987) and Cressie (1988). 

Battese, Harter and Fuller (1988), using a prediction approach, proposed a nested error 
regression model in order to estimate means. A more general model, a random coefficients 
regression model, had been previously proposed for a similar problem by Dempster, Rubin 
and Tsutakawa (1981). They used Bayesian techniques to estimate fixed and random effects 
in covariance component models when the covariances and variances are tentatively assumed 
to be known and the EM algorithm to subsequently estimate these unknown parameters. The 
introduction of random effects models not only allows for standard maximum likelihood 
estimation, but also provides measures of the reliability of the final estimates of the parameters 
in the form of posterior variances. 

Ericksen (1980) suggested using the mean squared error (MSE) to evaluate effectiveness of 
regression in small area estimation. He attempted to answer such questions as: When should 
more predictor variables be added to the regression equation? Should James-Stein weighting 
procedures be used when the synthetic and the regression estimate are far apart? He also warned 
of the effects of outliers on both the resulting estimate and its estimated error. Perhaps the 
effect on small area estimators of the failure of the linear model assumptions should be more 
seriously studied. 


240 MacGibbon and Tomberlin:Small Area Estimates of Proportions 


Although applied to the estimation of counts such as unemployment and mortality statistics, 
most of these techniques described were designed primarily for continuous outcome variables. 
Purcell and Kish (1980) introduced a categorical data analysis method for obtaining estimates 
of counts for small domains. Essentially, their methodology involves fitting log-linear models 
to the data, omitting some of the higher order interaction terms and obtaining estimates by 
the iterative proportional fitting algorithm described by Deming and Stephan (1940). We 
propose to extend these models to the problem of estimation of proportions in small domains 
as originally conceived by Dempster and Tomberlin (1980) by applying empirical Bayes tech- 
niques to logistic regression models with random effects. This would have the added advan- 
tage that a measure of uncertainty of the small area estimates would be available through the 
approximate posterior variances. The estimator proposed here is similar in nature to the com- 
posite one used by Schaible ef. a/. (1977) for unemployment rates, the principal difference being 
in the method for choosing the weights. We feel, however, that the empirical Bayesian para- 
digm gives a more natural and intuitive method for determining the weights. Empirical Bayes 
estimation based on simple logistic random effects has already proven useful in studying 
regional variation in mortality rates by Miao (1977). Somewhat more complex random effects 
models have been used for proportions on data from the World Fertility Survey (Wong and 
Mason, 1985) and for Poisson parameters on automobile insurance data (Weisberg, Tomberlin, 
and Chatterjee 1984 and Tomberlin 1988). 

Roberts, Rao and Kumar (1987) fitted logistic regression models to binary outcome data 
obtained using complex sampling schemes, constructed ‘‘pseudo-maximum likelihood”’ 
estimators, and compared their estimates to unbiased ones. They also proposed a goodness- 
of-fit test for their model, which takes the sampling design into account. A fundamental dif- 
ference between our approach and that of Roberts, et. al., is that by incorporating the 
characteristics of the sample design into the model, we can estimate parameters, and obtain 
readily interpretable measures of their reliability by means of standard maximum likelihood 
techniques. 


2. THE MODEL 


Following the framework of Dempster and Tomberlin (1980), in its most general form, we 
specify a model which describes the probabilities associated with individuals in the population 
as a function of categorical variables, continuous covariates and sampling characteristics. The 
models we consider in this paper are specific examples of the following, 


logit ( m,,) = 6, + XB + >, (2..}) 


where 7, represents the probability of a ‘‘response’”’ for the v-th unit in the p-th cell, the 
subscript pz refers to a set of categorical variable covariates, and the subscript v refers to a set 
of nested sampling characteristics, indicating PSU, SSU within PSU, and so on. The param- 
eter 6, represents a sum of fixed classification effects, the parameter ¢, represents a sum of 
random effects associated with sampling characteristics, the vector X,,, represents a vector of 
quantitative covariates, and the parameter @ is a vector of fixed logistic linear regression 
parameters. The random effects parameters are assumed to have some parametric distribu- 
tion, usually a multivariate normal distribution. The probabilities 7, are obtained by inver- 
ting the logit transformation as follows, 
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m,, = [1 + exp{-—(0, + X,, + ¢,)}] 7. (2:2) 


For purposes of illustration, consider the following simple example. Let the proportion of 
interest be the labour force participation rate. Suppose we have one classification variable 
indicating sex and one continuous covariate indicating the age of the individual. Suppose fur- 
ther that the sample design is a simple, two stage cluster sample. In the first stage, a sample 
of counties is drawn and simple random samples of individuals within selected counties are 
drawn at the second stage. 

For estimation purposes, consider the following model, 


logit (7,,) = 6, + X,,8 + 9; (2.3) 


$; ~ i.i.d. Normal (0, 07). (2.4) 


Here, the classification subscript, yw, indicates the sex of the individual; the sampling 
characteristics subscript, vy = i, indicates the j-th individual within the i-th PSU; X,,, indicates 
the age of the individual and ¢; is a random effect associated with the i-th PSU. 

The consequence of assuming that the PSU effects are independent, identically distributed 
is that PSU departures away from the fixed part of the model are treated as exchangeable; that 
is, apart from effects of age and sex, no systematic information exists regarding differential 
employment rates among the counties in the population. Obviously in a realistic situation, such 
information would exist, for example, dominant industry, distance from principal markets, 
retail sales, etc. In such cases, this auxiliary information should be incorporated into the model. 
However, for purposes of illustration, we will continue with this simple model. The choice of 
anormal distribution of the error terms is a mathematical convenience, and the consequences 
of this choice must also be evaluated after actual data analysis. Extensions from the simple 
model described in (2.3-4) to include additional covariates, both categorical and quantitative 
is straight forward. 

In theory, extensions to the model allowing for more complex sample designs is also simple. 
For example, data drawn using a three stage sample could be modelled using nested random 
effects as follows. 


logit (a5) = 6, AF xe B ar d; + $j) (233) 
¢; ~ Normal (0, 0?) 


$j(i) ~ Normal (0,04). 


Here, the sampling characteristics subscript, vy = ijk refers to the k-th individual within the 
j-th SSU within the i-th PSU. The parameter ¢; is the random effect associated with the i-th 
PSU, and ¢,j is the nested random effect associated with the j-th SSU within the i-th PSU. 
Stratification variables could also be incorporated within the fixed effects part of the model. 
While it is simple to write down the models corresponding to sample designs with several stages, 
without further research, it is not yet clear how difficult it will be to produce estimates based 
on these more complex models. 
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In an actual application, it would be necessary to use the data to identify predictor variables. 
This would require the development of some sort of model selection techniques. While not 
the primary focus of this paper, one might conceive of such a technique being based on an 
initial analysis using conventional variable selection techniques for logistic regression models 
as described by Haberman (1978), for example. Such an analysis could be conducted, ignoring 
the random effects parameters. Having chosen a set of predictors, the random effects would 
then be incorporated in the manner dictated by the sample design. 


3. ESTIMATES 


In this section, we develop empirical Bayes estimates for the simple model described in equa- 
tions (2.3-4). First, it is assumed that the variance component, o”, is known, and Bayesian 
estimates of the probabilities 7,;; are obtained. Then, the EM algorithm, as described by 
Dempster, Laird and Rubin (1977), is used to obtain the maximum likelihood estimate of o” 
allowing for empirical Bayes estimates. Finally, posterior variances of these estimates are 
obtained. The development of these estimates is similar to that described by Laird (1978) and 
by Tomberlin (1988). 


3.1 Bayes Estimates 


As noted by Laird (1978) in her analysis of contingency tables, by Dempster, Rubin and 
Tsutakawa (1981) in their analysis of variance components for linear models, and by Tomberlin 
(1988) in his analysis of Poisson data, a Bayesian analysis of a mixed model such as described 
in (2.3-4) can be obtained by placing a flat prior on the fixed parameters, 0, and 8 and the 
proper prior given in (2.4) on the random parameters, ¢j. 

Let the vector of 0-1 outcome variables indicating membership in the labour force be 
represented by y and let zrepresent a vector of the individual probabilities 7,,;;. The data are 
then distributed as a product binomial given by, 


py | T) ao II mpl ad _. eae = Puig) (1) 
pi 


The prior distribution of the parameters is given by, 


$7 
p@, $, B| 07) « exp [= i Al (3.2) 


i 


Thus, the joint distribution of the data, y, and the parameters is given by, 


piv, 8, $, B| 0”, X) = piv| 6, 4, B, 0”, X) p@, 4 B| 0”, X) (3.3) 


i epke oi 
«| TI my (l= a mw) exp| ~ D sp 


Bi 
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From this, the posterior distribution of the parameters is given by, 


P(y, 8, ¢ B| 0%, X) 


3.4 
P(y| 0, X) Ge) 


p(8, o B| y, 0”, X) 


It is not feasible to obtain a closed form expression for the posterior given in (3.4) due to the 
intractable integration required to obtain the marginal distribution of y. Here we adopt the 
approximation employed by Laird (1978) and by Tomberlin (1988). The posterior is expressed 
as a multivariate normal distribution having its mean at the mode of (3.4) and covariance matrix 
equal to the inverse of the information matrix evaluated at the mode. 

Obtaining the mode requires solving the following set of equations. This can be accomplished 
by using a multivariate Newton-Raphson algorithm. 


om ie= "yaa Xie (3.5) 


hi py 
a Dpij = D> i yi (3.6) 
yj ij 
9; 
Dy (DV paj ra Tyij) = > = 0. (3.7) 
Bj 


The posterior covariance matrix of the parameters is found by inverting the negative of the 
second derivative matrix of the log of (3.4) taken with respect to the parameters, and evaluated 
at the mode. Note that neither the equations for the mode, nor the covariance matrix involve 
the intractable denominator of (3.4). 

Elements of the inverse of the posterior covariance matrix are given by, 


—9? 
jee Ba ya) Xi (3.8) 
bij 
—9? 
592 he ta = Fy) (3.9) 
BK ij 
—92 ; 1 
eae aes aa (3.10) 
09; i 0 
—92 > 
ii UNTO Wicraat oa) be ny (3.11) 
dB 06, a a e 
—9? > 
= Tyij Al — Ay) Kyi (3.12) 
dB 09; a 
—9 3 
= tay = ips (6313) 
00, 09; ; 
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3.2 Empirical Bayes Estimates 


To obtain empirical Bayes estimates, the prior variance, o”, must be estimated from the 
data. A reliable estimate requires a reasonable number of PSU’s in the sample; otherwise, if 
the number of PSU’s is too small, a purely Bayesian approach is recommended. We propose 
to estimate the prior variance using an EM algorithm as described by Dempster, Laird and 
Rubin (1977). The general framework for the estimates is similar to that employed by Laird 
(1978) for contingency table analysis, and Tomberlin (1988) for Poisson data in a two way 
classification. The estimates for the simple two-stage sample are obtained in exactly the same 
way as used by Leonard (1988). 

The algorithm is initiated by choosing a starting value, 0%0)s for the variance component. 
The posterior distribution of the random effects, ¢;, is then obtained by carrying out a Baye- 
sian analysis as described in Section 2. This posterior distribution is then used to implement 
the E-step. The expected value of the sufficient statistic is calculated conditional on the data. 
The M-step is then completed by merely calculating the maximum likelihood function of the 
sufficient statistics. For a more complete description of the EM algorithm for regular exponen- 
tial densities, see Dempster, Laird and Rubin (1977). The process is then repeated with a Baye- 
sian analysis based on the updated estimate of the variance component, ot) The algorithm 
is continued until it converges. 


3.3 Estimates of Small Area Proportions 


Estimates together with associated posterior variances and covariances for parameters of 
the model given in (2.3-4) are presented in Sections 3.1 and 3.2. These estimated parameters 
are then employed to obtain estimates for small area proportions using a predictive approach. 
Assuming that the sample sizes within each area are small compared to those of the correspon- 
ding populations, this can be accomplished by averaging the individual estimated probabilities: 


Bi = a (3.14) 


where N; is the number of individuals in the i-th small area, and where the estimated pro- 
bability associated with the ij-th individual, 7, ;; is obtained by inverting the logistic function 
as follows, 


tj = U1 + exp(—(6, + X,i8 + $317’. (3.15) 


To develop posterior variances for the estimates of small area proportions, it is convenient 
to adopt a more conventional notation for the linear part of the model, using dummy variables 
to indicate classifications. Let Z,;; represent a vector of predictor variables, both quantitative 
and qualitative, associated with the pij-th individual and let I represent a vector of the 


parameters of the model. Then, 


Zi,T = 0, + X,y8 + 9%, (3.16) 
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wee =[1 + exp (—Z7,,7)]—. (3.17) 
Then, using a standard Taylor Series method, the posterior variance of the estimated small 
area proportion can be approximated as, 


Var (p;) = bs Zi ty A - ‘| zr] B Lyi Tig CA — ‘| . (3.18) 
N? pj 


HJ 


Here; si is the posterior covariance matrix of the estimated logistic regression parameters I. 

Should the samples within small areas be substantial parts of the associated populations 
within those areas, then some additional gains in precision could be made by predicting only 
for the non-sampled units, in the spirit of the finite population sampling prediction methods 
originally described by Royall (1970). 


4. THE SIMULATION STUDY 


A simulation study was carried out to illustrate the characteristics of three different 
methodologies for producing local area estimates of proportions. The three methods evaluated 
were, the classical unbiased estimates, model-based estimates similar to the straightforward 
“synthetic estimates’’ of Gonzalez and Hoza (1978), and a modification of the proposed 
empirical Bayes estimates described in section 3, above. Data were simulated for a two-stage 
sample design. The 15 primary sampling units (PSU’s) were also treated as the local areas for 
which individual estimates of labour force participation rates were required. Within each of 
the 15 PSU’s, simple random samples of 25 individuals were drawn, for a total sample size 
of 375. The local area populations were assumed to be infinite so that complications associated 
with finite population sampling could be avoided. 

As evaluations for local area estimates were required, it was decided to simulate resampling 
at the second stage only. That is, the same 15 PSU’s were drawn for each of the simulation 
studies. Each replicate consisted of a different sample drawn within these PSU’s. The study 
was based on 205 replications. 

The data were generated using the model described in equation (2.3). The parameters were 
defined as follows, 


6, = —0.5 (4.1) 
i) = —1.0 
BS 20k 


The random parameters ¢; were generated from a normal distribution having mean zero and 
standard deviation 0.25. The z,, were obtained by inverting the logistic transformation as 
given in equation (3.15). 

Here, 0, and 6, are the fixed effects associated with men and women respectively. That is, 
the odds ratio for labour force participation of men to that of women is exp[0.5] = 1.65. 
The parameter @ is the slope parameter associated with age, and the @; are the logistic random 
effects associated with the 15 PSU’s, or local areas. 
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Table 1 


Population Labour Force Participation Rates by Local Area 


Local Area 1 2 3 4 5 6 i! 8 


Participation Rate 0.79 0.79 0.96 0.88 0.90 0.95 0.86 0.96 


Local Area 9 10 11 12 13 14 15 


Participation Rate 0.61 0.87 0.81 0.91 0.94 0.92 0.83 


i 


The predictor variables, were generated with identical distributions for each of the 15 local 
areas. Age was distributed uniformly on the interval 20 to 40 years, the sex of each individual 
was drawn from a Bernoulli distribution with proportion 0.5, and the two predictor variables 
were assumed to be independently distributed. The population labour force participation rates 
for the 15 local areas are displayed in Table 1. As each local area was assumed to have the same 
distribution on the predictor variables, the only source of variation from area to area was the 
random local area effects, the ¢;. The random nature of these effects can produce a substan- 
tial variation in local area participation rates as is particularly evidenced by local area 9. 

The observed local area sample proportions were used as unbiased estimates. The synthetic 
estimator was based on the following fixed effects, logit model, 


logit (1,,) = 0, (4.2) 


where, 7, and 0, are defined as for the random effects model in (2.3). Notice, only data from 
a particular local area are used to form the unbiased estimator while data are pooled from all 
local areas to obtain the synthetic estimator. However, the synthetic estimators will be biased 
to a degree which depends on the extent that model (4.2) fails to capture differences between 
local areas. 

The third estimator studied here is a modification of the proposed empirical Bayes estimator 
described in Section 3. Due to the amount of computer time required to estimate the variance 
component associated with the local area effects, in fact, the Bayes estimator described in Sec- 
tion 3.1 was employed. The prior variance used for these estimates was the known value of 
the variance given in (4.1) used to simulate the data. As a result of this compromise, the results 
for the ‘‘empirical Bayes’’ estimator given below would be expected to be somewhat better 
than those which would be obtained using a true empirical Bayes estimator. However, sen- 
sitivity analyses aimed at determining the effect of changes in the prior variance indicate that 
the results which would be obtained using the empirical Bayes estimator would not be expected 
to substantially differ from those reported here for the modified estimator. 

To look at bias, (in the classical sense of design-based inference) the estimates were averaged 
over all 205 replicates. Averages for each of the 15 local areas, for each estimation method 
are presented in Figure 1. The population rates are plotted as the ‘“True Proportions’’. These 
rates are almost exactly the same as the average unbiased estimates, and for the most part, 
are not visible on the graph. This confirms the unbiased nature of the classical estimates. 

The synthetic estimates do not vary much from local area to local area. As each local area 
rate is based on the same pooled, fixed parameter estimates, the only source of variability from 
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Figure 1. Averages of the estimated labour force participation rates for each of the three estima- 
tion methods plotted by local area 


local area to local area is the small variability in the realized distributions of the predictor 
variables. The bias of this estimator can be large, as for example is the case for local area 9, 
where the synthetic method has a large positive bias. On the other hand, it should be noted 
that the synthetic method could not be expected to perform very well where there is little 
variability between the local area distributions of predictor variables. 

The averages of the proposed estimates are in between the two extremes of the unbiased 
and synthetic estimates. They are biased, again in the classical sense, but their biases are smaller 
than those of the fixed effects model synthetic estimators. 

Empirical Root Mean Square Errors (RMSE) were also calculated for each of the three esti- 
mators. These are presented in Figure 2. This plot demonstrates graphically where the synthetic 
estimator performs well and where it performs poorly. For local areas 7 and 10, where the local 
area effect is close to zero, the expected value of the synthetic estimator is very close to the popula- 
tion proportion. In these areas, the synthetic estimator has by far the smallest RMSE. By pooling 
data from the whole sample, it obtains a small sampling variance. On the other hand, in local 
area 9 where the local area effect is quite large, the associated RMSE for the synthetic estimator 
is also very large, due to its large bias. The modified empirical Bayes estimator obtains most 
of the reduction in RMSE that results from pooling the data across local areas, without suf- 
fering from the large bias associated with the synthetic estimator in those areas with large local 
area effects. In all but two cases, the modified empirical Bayes estimator achieves a smaller RMSE 
than the unbiased estimator. For local area 3, the RMSE’s for the two estimators are about the 
same, and for local area 9, with a large local area effect, that of the modified empirical Bayes 
estimator is somewhat larger than that of the unbiased estimator. In short, the modified empirical 
Bayes estimator is sometimes the best of the three and never the worst. 
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Figure 2. Empirical Root Mean Square Errors associated with each of the three estimation 
techniques plotted by local area 


One of the principal shortcomings of the usual, fixed effects synthetic estimators is the dif- 
ficulty in obtaining useful measures of associated accuracy. One can only obtain measures of 
sampling variances. Measures of bias which reflect model inadequacies are not available. For 
unbiased estimates, on the other hand, the usual estimates of sampling variability are also mean 
square error estimates as there is no bias. For empirical Bayes estimates, measures of uncer- 
tainty are available from the posterior covariance matrix of the parameters. These posterior 
variances reflect sampling variability as well as the ‘‘bias’’ which comes from simple fixed effects 
model inadequacies. This latter source of uncertainty is captured via the variability in the local 
area effects parameters. 

The usefulness of these measures of uncertainty are compared graphically in Figure 3. The 
vertical axis corresponds to the empirical root mean square error (RMSE) which is obtained 
by comparing the individual replicate estimates with the known population proportions for 
each local area. The horizontal axis corresponds to the ‘‘reported RMSE’’. For the classical 
unbiased estimates, these are merely the sampling standard deviations for simple random 
sampling. For the synthetic estimates, they are also sampling standard deviations, corrected 
for the cluster sampling. The ‘‘reported RMSE”’ for empirical Bayes estimates are the square 
roots of the posterior variances of the estimated proportions which were obtained using the 
methods described in Section 3.2 above. 

Note that the points corresponding to the unbiased estimates lie along a line indicating 
that the reported RMSB’s are very close to the empirical RMSE’s. This is as expected since 
there is no bias in these, so the reported RMSE’s and the empirical RMSE’s are merely 
sampling standard deviations. As opposed to this, the points corresponding to the synthetic 
estimates are in a cluster above 0.015 to 0.020 on the horizontal axes. For these estimates, 
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Figure 3. Empirical Mean Square Error vs ‘‘Reported Mean Square Errors’’ for each of the three 
estimation techniques 


the ‘‘Reported RMSE’s’’ are estimates of sampling standard deviations, which for these pooled 
estimates are all quite small. However, the empirical RMSE’s for these estimates are quite a 
different story. They range from 0.015 to 0.100, with one outlier in excess of 0.250 (local area 
9). Sampling variances alone are not sufficient to describe the uncertainty associated with the 
estimates. 

The case for the modified empirical Bayes estimators is again in between these two extremes. 
However, with respect to the relationship between reported RMSE and empirical RMSE it is 
much closer to the corresponding relationship for the unbiased estimators. With the excep- 
tion of the point associated with local area 9, the average reported RMSBE’s are very close to 
the corresponding empirical RMSE’s. 


5. CONCLUSIONS 


In the simple simulation of a two-stage sample where PSU’s correspond to local areas, the 
modified empirical Bayes estimators have been shown to be superior, overall to two standard 
alternatives. These have been evaluated in three ways, design-bias, root mean square error, 
and validity of estimable measures of uncertainty. The classical estimator is shown to be superior 
in terms of design-bias, as expected since it is design unbiased. In addition, valid estimates of 
RMSE’s are available using standard techniques. However, these estimators suffer from large 
RMSE’s due to the fact that they are formed from limited amounts of data. Indeed, unlike 
the other two alternatives, no estimates can be formed at all for local areas not in the sample. 
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At the other extreme, the synthetic estimator is far more stable than either of its competitors. 
Since all estimates are based on data from the whole sample, associated sampling variances 
are much smaller than those of the other two estimators. On the other hand, this estimator 
is unable to adjust for local areas which are quite different from the rest. This is the case, even 
when data are available in the sample that would indicate such a difference. As important, 
estimates of uncertainty in the form of sampling standard deviations for this estimator are par- 
ticularly misleading since they are unable to account for departures from the fixed effects model. 

As a compromise between these two estimators, the modified empirical Bayes estimator per- 
forms well on all three assessments. By using the data from the specific local areas to the extent 
it is reliable, this estimator avoids the large biases associated with the synthetic estimator. On 
the other hand, by pooling information from the whole sample, it has smaller sampling 
variances than the unbiased estimator, and generally smaller RMSE’s. Finally, posterior 
variances are available as useful measures of uncertainty. 

Several tasks remain in the investigation of the proposed estimators. First, the effect of using 
true empirical Bayes estimators instead of modified ones must be assessed. Some guidelines 
for minimum number of sampling units for valid empirical Bayes inference are required. True 
empirical Bayes estimates employ estimated prior variances and methods which account for 
this additional uncertainty are required. For example, the bootstrap techniques investigated 
by Laird and Louis (1987) could be used. Second, the estimation techniques need to be 
generalized to handle three and more stages of sampling. While the theoretical extension is 
trivial, the computational implications are not. Finally, these techniques must be applied to real 
data before recommending their adoption as a standard alternative for local area estimation. 
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Updating Size Measures in a PPSWOR Design 


ALAN SUNTER'! 


ABSTRACT 


It is sometimes required that a PPSWOR sample of first stage units (psu’s) in a multistage population 
survey design be updated to take account of new size measures that have become available for the whole 
population of such units. However, because of a considerable investment in within-psu mapping, segmen- 
tation, listing, enumerator recruitment, efc., we would like to retain the same sample psu’s if possible, 
consistent with the requirement that selection probabilities may now be regarded as being proportional 
to the new size measures. The method described in this article differs from methods already described 
in the literature in that it is valid for any sample size and does not require enumeration of all possible 
samples. Further, it does not require that the old and the new sampling methods be the same and hence 
it provides a convenient way not only of updating size measures but also of switching to a new sampling 
method. 


KEY WORDS: PPSWOR; Sample updating; PPS sequential sampling. 


1. INTRODUCTION 


It is sometimes required that a PPSWOR sample of first stage units (psu’s) in a multistage 
population survey design be updated to take account of new size measures that have become 
available for the whole population of such units. This occurs, for example, when the psu’s are 
census enumeration areas (or collections of census enumeration areas) and a new census has 
made new population/housing counts available or when, because of observed uneven growth 
in EA populations in an intercensal period, it is decided to do an interim update of size measures 
in a sampling stratum. However, because of a considerable investment in within-psu mapping, 
segmentation, listing, enumerator recruitment, efc., we would like to retain the same sample 
psu’s if possible, consistent with the requirement that selection probabilities, originally pro- 
portional to the old size measures, may now be regarded as being proportional to the new ones. 
A comprehensive treatment of the problem for n = 1 is given by Kish and Scott (1971) and 
is itself a generalization of a method given earlier by Keyfitz (1951). They point out that their 
method may be extended without difficulty to with replacement sampling (PPSWR) forn > 1. 
Their method may also be used (Drew, Choudhry, and Gray 1978; Platek and Singh 1978) for 
n > 1 when the PPSWOR procedure used is that due to Rao, Hartley and Cochran (1962), 
since this method involves the formation of n random groups and subsequent selection of a 
single psu from each group. It breaks down however if we wish, as indeed we probably would, 
to form new random groups according to the new size measures. Fellegi (1966) provides two 
methods applicable to a PPSWOR sample of n = 2 drawn by the Fellegi (1963) procedure. 

The method given in this paper is similar to the second Fellegi method and, when applied 
to the examples in the Fellegi paper, gives very similar results. Unlike that method, however, 
it does not require the enumeration of all possible samples and hence is a feasible procedure 
for any value of n and N. Although it is formally applicable to any PPSWOR method for which 
it is feasible to calculate the selection probability of any sample selected it has its highest utility 
for PPSWOR methods in which all, or nearly all, n-tuple subsets are possible samples with 
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probabilities approximately proportional to the product of their unit probabilities. The method 
of this type, used for purposes of illustration, is the author’s pps sequential method (Sunter 
1986, 1989). 


2. REPLACEMENT PROCEDURE THEORY 


We wish to reselect a PPSWOR sample, originally selected with probabilities {7}, m2, 
.. +5 Tn} proportional to original size measures {Zj;, Z12, ---» Zin} under a new set of prob- 
abilities {1 ,, 72, .--, 72} proportional to new size measures {Z), 222, ---, Z2n}. How- 
ever, we want to do this in such a way that we have a high probability of retaining the original 
sample. 

We assume that for any particular n-tuple S, including of course S’, the original sample 
actually selected, it is possible to calculate both P,(S), its selection probability under the 
original scheme, and P,(S), its selection probability under a new scheme. For many samples 
in many schemes (e.g. pps systematic) one or both of these probabilities may be zero although, 
obviously, P,(S’) cannot be zero. 


The procedure is as follows: 
Step 1: (a) Calculate P,(S’), P2(S’). 
(b) If P,(S’) = P,(S’) then retain the sample. 


(c) If P,(S’) < P,(S’) retain the sample with probability P,(S’)/ 
P,(S’). If rejected proceed to Step 2. 


Step 2: (a) If the original sample was not retained then draw a new sample, 
S, say, with probability P,(S,). If P,(S,;) < P,(S;) then reject the 
sample, otherwise retain with probability 1 — P,(S;)/P2(S;). If 
rejected proceed to Step 2(b). 


(b) If the Step 2(a) sample was not retained then draw a new sample, S, 
say, and proceed as for Step 2(a). 


(c), (d), ... Repeat the Step 2(a), 2(b), ... procedure until a sample is retained. 


The sample eventually retained by this process has the required probability structure for 
both unit probabilities and unit pair joint probabilities. In other words, it may be regarded 
as having been drawn under the new scheme. In particular, since it has the same joint probability 
stucture, it has the same sampling variance. 

Let P* denote the probability that the process does not terminate at Step 1, P** the condition- 
al probability that it does not terminate at Step 2(a) given that it did not terminate at Step 1. 
Obviously P** is then also the conditional probability that the process does not terminate at 
any subsequent step given that it did not terminate at any step preceding that step. We now have 


P* «= De (1 — P,(S;)/P; (S;)) Pi (S)) 


i:P2 (Sj) < Py (Sj) 


ye (P,(S;) — Pa(S;)) (1) 


i:P2(S;) < Py (Sj) 
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where / now indexes the n-tuple subsets of the N population units, and 


| a a jis (1 — P,(S;)/P2(S;))P2(S;) 


i:P1 (Sj) < P2(S;) 


=1—- » (P2(S;) — Pi (S;)) (2) 


i:P (Sj) < P2(S;) 


while, since ) ;P,(S;) = ¥;P2(S;) = 1, it is easy to see that the summation terms on the 
right of (1) and (2) respectively must be equal and we have P* = 1 — P**, 
Denoting ultimate selection probability by P’ we now have, by design: 


For i:P>(S;) < P,(S;) 
P’(S;) = P,(S;) (P2(S;)/P;(S;)) 
= P,(S;), as required. 


For i:P,(S;) = P;(S;) 
P’(S;) = P,(S;) + P*(1 — P,(S;)/P2(S;))P2(S;) 

+ P*P**(1 — P,(S;)/P2(S;))P2(S;) 
+ P*(P**)?(1 — P,(S;)/P2(S;))P2(Si) 
+ P*(P**)3(1 — P,(S;)/P2(S;))P2(S;) 
sho ant 

= P,(S;) + P*(P2(S;) — P,(S;)))/(i — P**) 

= P,(S;) 


as required. 


Finally, we observe that the expected number of Step 2 ‘‘trials’’, given that the original 
sample was not retained at Step 1, is given by the binomial waiting time distribution as 
U7 a Be) as 1 PT: 


3. APPLICATION AND EXAMPLES 


The new scheme need not be the same (even apart from the change in unit probabilities) 
as the old one. We could switch, for example, from a sample originally drawn under pps 
systematic sampling to one drawn under the author’s (Sunter 1986, 1989) pps sequential scheme 
or even from PPSWR (pps with replacement) to a PPSWOR scheme. In the latter case, of 
course, an original sample with multiple inclusions of a single psu has zero probability of selec- 
tion in the new PPSWOR scheme. The procedure may still be used, it may be noted, even if 
we have included new psu’s in the stratum but are retaining the same sample size. 
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The procedure probably has its highest practical utility, as measured by its probability of 
retaining the same sample, when both the old and the new schemes are such that all, or nearly 
all, samples are possible and their probabilities are approximately proportional to the product 
of their unit selection probabilities. Under these circumstances, and provided that the changes 
in size measures are not extreme, P; (S;) and P,(S;) tend to have about the same values so that 
the probability of retaining the same sample will be relatively high. A practical PPSWOR 
method with the required properties is the author’s, referred to above. Since we will use this 
method in the examples of the next section, we now describe it. There are two variants, in both 
of which we have to find a suitable ordering of the population and accumulate the size measures 
(which we assume to be scaled to sum to 1), in reverse order (so to speak), to give: 


N 
Z= Yoysi=l, 2, ..., N. 


Variant 1: Order the population in any way such that 
(a) nz; = Z;; § =US2Q Nin 


(b) (nu. — 1)z; < Zee ht A SEN — 1. 
Then select units until exactly n have been selected according to: 


pcusinn = { 


n;z;/Z; otherwise 


where n; is the number of sample units still required to be selected when we arrive at the 
i-th population unit. 


It is always possible to satisfy the ordering requirements (a) and (b). For example ordering 
by increasing size obviously satisfies both as does ordering by decreasing size down to the point 
(if any) at which (b) fails and then by increasing size. The latter ordering has some advantage 
in that it tends to minimize the slight (and, for practical purposes, negligible) deviation from 
strict pps for the last n units (see Sunter 1986). Variant 2 avoids these deviations altogether 
by taking advantage of the fact that if it occurs that there are n; + 1 units remaining in the 
population for any i, then it is usually possible to simply discard one of these units with 
appropriate probability and retain the others. 


Variant 2: Order the population in any way such that 
(ayng SZ Or Fe aN =I 
(cb i. Wh tempsl) 23a SoSeininerdacael\ nep its 
Then 


(i) select according to P(U; | n;) = nz;/Z; until either n; = 0 or n; = 
N — i, then 


(ii) if n; > 0 discard one of the remaining units, say that indexed /, with 
probability 1 — n;z;/Z; and select the others. 
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An algorithm for finding an ordering satisfying the requirements for Variant 2 is given in 
Sunter(1986) and is incorporated in the program used for the simulations of the next section. 
In both variants 7;; maybe calculated according to 


Ky = n(n, = 1)2;2;/Ty 
where i < j (in the indexing of the ordering actually used) and 
7, = 1/Z, 
Ti vid Ze" 1) (ol — cae. Gl Ze 


These expressions are exact fori < j < N — n + 1, and providea very close approxima- 
tion otherwise. They are easily calculated and give the method the advantage, unique among 
practical procedures for PPSWOR with n > 2, of the availability of variance estimation with 
negligible bias. 

Pascal-like pseudocode for a routine that selects a sample according to Variant 1, at the 
same time calculating its probability and the value of 7; for each selected unit, is given in an 
Appendix. It is easily extended to Variant 2 or modified to the calculation of P(S) for an already 
selected sample. 


3.1 Example 1 


To illustrate these procedures we take first an example withn = 2andN = 4, small enough 
for sample enumeration and manual calculation, where it will be seen that, in order to obtain 
the ‘‘new’’ size measures, we simply inverted the order of the original assignment. The Variant 
2 ordering algorithm mentioned above gives (4,1,2,3) for the first set of size measures and 
(1,4,3,2) for the second. There are six possible samples, listed in column (1) of Table 2, whose 
probabilities under the Variant 2 algorithm are easily calculated, with results shown in columns 
(2) and (3). Column (4) gives the probability of retaining this sample at Step 1, given that it 
was the original selection. Column (5) gives the conditional probability of retention at any 
subsequent Step 2, given that no sample was retained at a preceding step. 

It may be verified that the overall probability of retention of the same sample, given by the 
sum of the products of the values in columns (2) and (4), is 0.5465. This value may be com- 
pared with the overall probability of retention of the same sample when the new sample is 
selected independently, given by );P; (S;)P2(S;) = 0.1168. Thus even in this rather extreme 
example, we have considerably increased the likelihood of retaining the same sample. 


Table 1 
Selection Probabilities 


PSU Q1i 22i 


1 0.15 0.35 
2. 0.20 0.30 
3 0.30 0.20 


4 0.35 0.15 
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Table 2 
Re ica nn a 0 ed i Sc tee one 
(1) (2) (3) (4) (5) 

Sample P;(S) P,(S) P ),(S) P)\2(S) 
ie? 0.0231 0.3231 1.0 0.9286 
1,3 0.1154 0.2154 1.0 0.4643 
1,4 0.1615 0.1615 1.0 

D5 0.1615 0.1615 1.0 0 

De 0.2154 0.1154 0.5357 0 

3,4 0.3231 0.0231 0.9715 0 


3.2 Example 2 


In amore realistic set of examples we now taken = 4 for a population of 100 psu’s with 
‘‘original’’ size measures independently assigned from the uniform or rectangular distribu- 
tion R(1,3). ‘‘New’’ size measures are assigned in a number of ways, described below. For these 
examples it is no longer feasible to enumerate all possible samples or to perform the sample 
selection and sample probability calculations manually. However, writing a computer program 
to do the latter and to apply the reselection procedure was a straightforward task. The pro- 
gram was used to perform 200 iterations, for each example, of selection of a sample using 
Sunter’s Variant 2 with probabilities proportional to the first set of size measures with subse- 
quent application of the procedures described above for reselection of a sample with probabil- 
ities proportional to the second set of size meaures. The program, running on an XT-compatible 
operating at 7.16 MHz, generated and sorted the populations of size measures and performed 
200 iterations of the sample selection and reselection in about three minutes. 

Case 1, in which we have assigned new size measures from the same distribution indepen- 
dently of their original values, may be seen as a ‘‘worst practical case’’ scenario. Case 2, in 
which 10% of the psu’s have doubled in size with the rest remaining unchanged, is an approx- 
imation of a ‘‘scattered development”’ scenario. Case 3 illustrates the random perturbation 
of size measures by an amount rectangularly distributed over an interval equal to the original 
size measure. From Table 3 it may be seen that with probabiliities ranging from 0.67 in the 
‘“‘worst case’ scenario to 0.81 in the ‘‘scattered development’ scenario, we retain the original 
sample. For those cases in which the original sample is rejected the average number of Step 
2 trials required to select a new sample agreed closely with the predicted value of 1/P*. 


Table 3 


200 Iterations of a Size Measure Update Procedure, n = 4, M = 100; 
Original Size Measures from R(1,3) 


Average 


Step 1 Estimated 
Case Source of 7; eientone Step ys p* 
Trials 
1 22, = R(1,3) 134 2.98 0.33 
ye 22; = 2*z,; for 10% of psu’s 153 5.53 0.19 


3 Zo) = R(2;/2,323;/2) 154 4.17 0.25 
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APPENDIX 


Pseudocode for Variant 1 of PPS Sequential Sampling 


It is assumed here that the population of size measures has already been given a suitable 
ordering, say by the algorithm given in Sunter (1986) and that its index, i, in this ordering iden- 
tifies the unit. Size measures, scaled to sum to 1, are stored in an array z[1.. PopSize] with 
their cumulative values (accumulated from PopSize down to 1) stored in an array 
Z[1..PopSize]. The meaning of the variables will be clear from the names that they are given. 
The results are to be stored in an array Sample [1 ..SamSize,1 ..3] in which the elements are 
population index /, unit probability 7;, and 7; respectively. ‘‘Random”’ is a function that 
returns a random number uniformly distributed on the the interval (0,1). The indentations in 
the code written below are intended to facilitate the visual pairing of the begin/end’s that 
delineate a compound statement. 


{ Variables initialization } 
i = 1; SamProb = 1; NumRem = SamSize; Gamma = 1/Z[2]; 
{Sampling routine} 
while NumRem > 0 do 
begin 
if i > 1 andi < PopSize then 
Gamma = Gamma*(1 — z[i — 1]/Z[i])*Z[i]/Z[i + 1]; 
if i = PopSize — NumRem + 1 or Random < = Numrem*z[/]/Z[i] 
then 
begin 
if i <> PopSize — NumRem + 1 then 
SamProb = SamProb*NumRem*z[i]/Z[i]; 


NumRem = NumRem — 1; 


Sample[SamSize — NumRem,1] ik 
Sample[SamSize — NumRem,2] = SamSize*z[/]; 
Sample[SamSize — NumRem,3] = Gamma; 
end else SamProb = SamProb*(1 — NumRem*z[i]/Z[/]); 
b= rol; 


end. 
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The Use of Administrative Records for 
Estimating Population in Canada! 


RAVI B.P. VERMA and RONALD RABY? 


ABSTRACT 


This paper examines the adequacy of estimates of emigrants from Canada and interprovincial migra- 
tion data from the Family Allowance files and Revenue Canada tax files. The application of these data 
files in estimating total population for Canada, provinces and territories, was evaluated with reference 
to the 1986 Census counts. It was found that these two administrative files provided consistent and 
reasonably accurate series of data on emigration and interprovincial migration from 1981 to 1986. 
Consequently, the population estimates were fairly accurate. The estimate of emigrants derived from 
the Family Allowance file could be improved by using the ratio of adult to child emigrant rates com- 
puted from Employment and Immigration Canada’s immigration file. 


KEY WORDS: Interprovincial migration; Emigration; Population estimates; Census counts; Accuracy. 


1. INTRODUCTION 


The national Census, conducted every five years since 1951, provides a wide range of 
demographic data on the Canadian population. However, unlike some other industrialized 
countries, Canada does not have a continuous population registration to derive basic 
demographic data and track the movement of people over different geographic areas for non- 
census years. To fill this gap, since the 1940s Statistics Canada has developed a program of 
population and family estimates. For example, population estimates for Canada, provinces 
and territories, census divisions, and census metropolitan areas are produced using the latest 
census counts and several administrative data sources, including: Revenue Canada tax files 
and Family Allowance files for migration; Vital Statistics registration for births and deaths; 
and Immigrant Visa and Record of Landing Registration for immigration. 

The strengths and weaknesses of these administrative files for estimating population and 
migration compared with 1981 Census data have been discussed elsewhere. (Statistics Canada 
1987; Verma and Parent 1985; Norris, Britton and Verma 1982). In this paper, the accuracy 
of estimates of the components of population change for provinces and territories using the 
Family Allowance and Revenue Canada data sources will be evaluated by comparison with 
the 1986 Census counts. This evaluation will compare 1971, 1976 and 1981 data. 

The paper is presented in the following sections: data sources and the methods of estima- 
tion; results of the evaluation; and conclusions and discussion. 


2. DATA SOURCES AND THE METHODS OF ESTIMATION 


This section describes the procedures for estimating total population, interprovincial migra- 
tion, and emigration. 


! Revised version of a paper presented at Statistics Canada Symposium on Statistical Uses of Administrative Data, 
November 1987. 

2 Ravi B.P. Verma and Ronald Raby, Demography Division, Statistics Canada, 4-A Jean Talon Building, Ottawa, 
Ontario, K1A OT6. 
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2.1 Total Population 


Quarterly and annual estimates of the total population of Canada and the provinces and 
territories, and annual totals for census divisions and census metropolitan areas, are produced 
by the component method. At the national level, the number of births and immigrants are added 
to, and the number of deaths and emigrants subtracted from, the base population (taken from 
the latest Census of Canada). By province and for smaller areas, estimates of internal migra- 
tion are also taken into account. 


The component method is expressed as follows: 
P(t + i) = P(t) + [B(t,t + i) — D(tt + i) 
+ I(t,t+ i) — E(t,t + i)) + N(t + 1). (1) 


Where, for any given province: 


P(t + i) 


estimate of population at time ¢ + / 
P(t) = Census population counts at time ¢ 
B = number of births between time ¢ and ¢ + / 
D = number of deaths between time ¢ and ¢ + / 
I = number of immigrants between time ¢t and t¢t + i 
= number of emigrants between time ¢ and t¢ + i 
N = number of net interprovincial immigrants between time ¢ and ft + i 


(t,t + i) = interval between the last census date and the reference date of the estimate. 


2.2 Interprovincial Migration 


Two administrative files are used to produce annual and quarterly estimates of interprovin- 
cial migration. Preliminary estimates are derived from Family Allowance files, while final 
figures are estimated from Revenue Canada income tax files. 


2.2.1 Preliminary Estimates 


The number of adult migrants is estimated using child migration figures derived from Family 
Allowance files, and ratios of adult out-migration rates to child out-migration rates (fj «) 
based on the most recent Revenue Canada tax file (calculated for 1 or 2 years before the refer- 
ence date). Recipients of Family Allowance cheques must notify the Department of Health 
and Welfare of changes in address. These changes are compiled monthly for both province 
of origin and destination, by size of family (the number of children per family receiving the 
allowance). Coverage of the population by Family Allowance is comparable to that of the 
census (Statistics Canada 1987, p. 46). Estimates of the number of interprovincial out-migrants 
for all age groups are calculated as follows: 


M,. 
3 (j,k),0-17 
Mois = 7 IG aes (2) 
Pj.0-17 
M(,k),18+ = MG,k),0-17 


Pi 18+ Pj,0-17 
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M&k),0+ = Miw,18+ + MGk),0-17 (4) 
where: 
M (,k),0+ = estimated total number of persons out-migrating from province to province k 


M(.k),18+ = estimated number of adult out-migrants (aged 18+) from province j to 
province k 


M(.«),18+ = number of adult out-migrants from province j to province k derived from 
Revenue Canada tax files 


M(,«),0-17. = number of child out-migrants (aged 0-17) from province j to province k derived 
from Revenue Canada tax files 


M (&,«),0-17. = number of child out-migrants from province j to province k, based on Family 
Allowance files 


Pigs = estimated number of adults in province j, the difference between the total 
population estimates and estimates of the child population based on Family 
Allowance files 


Pj.0-17 = total number of children receiving Family Allowance payments in province j 

IG% = estimation factor for adult migrants from province of origin j to province 
of destination k, based on estimates of migration from Revenue Canada tax 
files 

Pigs = number of adults in province 7, Demography Division population estimates 

Pio.17 = number of children in province j, Demography Division population estimates. 


2.2.2 Final Estimates 


Revenue Canada tax files are used to produce final estimates of interprovincial migrants. 
All individuals receiving an annual income above a specified minimum are required to file an 
income tax return by the end of April of each year. Migrant tax filers are identified by com- 
paring area of residence from two consecutive tax returns. Information on the number and 
ages of dependents is imputed from the total amount of personal exemptions claimed by filers. 
An adjustment is made for segments of the population not covered by the Revenue Canada 
system; this includes people who neither file an income tax return nor appear as dependents 
in another filer’s return (Norris and Standish 1983; Statistics Canada 1987). 


2.3 Emigration 


In Canada no system exists for recording emigrants; hence, their numbers must be estimated. 
Revenue Canada income tax files with an ‘‘out-of-Canada’”’ address one year and an “‘in- 
Canada’’ address for the previous year are used to identify emigrants. The emigrant status of 
children under 17 years of age is determined from change of address notifications from Family 
Allowance recipients. By combining information from these two administrative files, both 
preliminary and final estimates of emigrants are generated. The estimation procedures are 
similar to those used to estimate preliminary interprovincial migration: 
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. Fj0-17 
Ej = a fe Parse | + F017 (5) 
Pj.0-17 
/ E’ 
18+ 0-17 
fo =z—+t+e (6) 
Poise P.o17 


12 
Bie | 4 | @ 


= estimated annual number of emigrants from province j 


where: 


> 


~. 


by 
ll 


estimated annual number of emigrants from Canada 


E;,o-17 = number of emigrants from province j aged 0 to 17 who were eligible for 
Family Allowance 


Pjo-17 = number of children in province j who are eligible for Family Allowance 


P;4g4 = adult population of province j obtained by subtracting the number of 
children eligible for Family Allowance from the total estimated population 


f.. = annual adjustment factor for estimating adult emigration from Canada, 
based on Revenue Canada tax files. 


FE ig+ and E2937 = estimated numbers of adult and child emigrants from Canada, based on 
Revenue Canada tax files. 


P43, and P,..9.;7 = estimated June Ist population of adults and children for Canada, based 
on the component method. 


The method of estimating the number of emigrants was modified in March 1989, affecting 
estimates after 1986. The new method combines counts by age of emigrants from Canada to 
the United States (from the U.S. Department of Justice, Immigration and Naturalization 
Service), and estimates of the numbers of emigrants from Canada to countries other than the 
U.S. based on Family Allowance files and an f, factor calculated from immigration files (see 
Raby, Martel and Cartier 1989). 


3. EVALUATION OF ESTIMATES OF THE COMPONENTS OF 
POPULATION CHANGE 


Each component of population change (births, deaths, immigrants, emigrants and inter- 
provincial migrants) may contain a degree of bias and error. However, the data on births, deaths 
and immigration can be regarded as more accurate than the estimates of emigrants and inter- 
provincial migrants. In 1982, the methods of estimating emigrants and internal migration were 
thoroughly updated (see Statistics Canada 1987). These revised methods are evaluated below. 
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Table 1 
Estimates of Emigrants by Different Methods, Canada, 1976-1981 and 1981-1986 


Method 1976-81 1981-86 
Residual* 
(a) Unadjusted 277,558 476,373 
(b) Adjusted for Undercoverage 196,955! 134,857! 
(c) Adjusted for Net Undercoverage 194,155? 218,148 
Revenue Canada Tax File 207,420 165,272 
Family Allowance Method 278,624 235,481 
Reverse Record Check 296,724 288,376 


*Residual Method: 
Emigrants = ([Births — Deaths] + [Immigrants]) — Intercensal growth of population 
between time ¢ and ¢ + 5. 


! The undercoverage rates were 2.04% for the 1976 Census, 2.01% for the 1981 Census, and 3.21% for the 1986 Census. 

2 The 1976, 1981 and 1986 Census net undercoverage rates were 1.53%, 1.51% and 2.40% respectively. They are 
estimated using the U.S. experience of overcoverage which is 25% of the undercoverage rate. 

Source: Demography Division, Statistics Canada. 


3.1 Emigration Data 


Table 1 presents estimates of emigrants from Canada by using different methods and data 
sources for 1976-1981 and 1981-1986. For 1981-1986, the estimate using the residual method 
is considerably higher than the estimate based on the Family Allowance file. The residual 
method subtracts the population growth between 1981 and 1986, unadjusted for census under- 
coverage, from natural increase and immigration. Since births, deaths and immigration data 
are assumed to be accurate, the higher estimate by the residual method can be attributed to 
the difference in undercoverage rates for 1981 and 1986. After adjusting the 1981 and 1986 
Census counts for undercoverage (2.01% and 3.21% respectively), the estimate by the residual 
method was found to be 134,857. This figure is lower than estimates obtained using both the 
Family Allowance file (235,481) and the Revenue Canada tax file (165,272). 

This low estimate may result from different rates of overcoverage in the 1981 and 1986 
Censuses. No estimate of overcoverage is calculated in the Reverse Record Check study, but 
the rate can be assumed to be similar to the U.S. Census rate which is 25% of the undercoverage 
rate. After adjusting the 1981 and 1986 Census counts for net coverage rates of 1.51% and 
2.40% respectively, the residual estimate (218,148) was close to the Family Allowance-based 
estimate (235,481). 

For 1976-1981, the estimating methods do not produce similar results. The number of 
emigrants estimated by the residual method adjusted for net undercoverage was 194,155, which 
is close to the estimate based on Revenue Canada tax files (207,420), but considerably lower 
than the Family Allowance method estimate (278,624) or the Reverse Record Check estimate 
(296,724). 

One possible source of error in the current method is the f, factors, which are adult-child 
emigrant ratios, estimating the number of emigrants aged 18+ from 1981-1986. These ratios 
were obtained from the emigration data provided by the Revenue Canada tax files. 

Table 2 shows f. values derived from different data sources. The f, factors from the 
Revenue Canada tax files are less than unity and higher than unity from the three other data 
sources: interprovincial migration data from income tax files, immigration files, and data on 
Canadian emigrants to the United States. The estimates of emigrants from these sources are 
also higher than the Revenue Canada-based estimate. 
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Table 2 


Estimates of Emigrants by Family Allowance Method Using Different Values 
of f. (Adult-Child Emigrant Ratios), 1981-1986 


a 


Value of f, Factor Number 
of 
1981-82 1982-83 1983-84 1984-85 1985-86 Emigrants 
SIN SOO ee 8 eee 
1. Revenue Canada Tax 
Files 0.8698 0.8768 0.9052 0.8592 0.8592 235,481 


2. Interprovincial 
Migration Data from 


Data Source of f, 


Income Tax Files 1.0760 1.1000 1.0664 1.0290 1.0029 265,816 
3. EIC Immigration 

Data 1.0801 1.0926 1723 1.1254 1.0694 Ziel Oe 
4. Canadian Emigrants 

to the U.S.A. 1.2300 1.2774 1.3196 1.3745 1.4232 316,268 


Source: Demography Division, Statistics Canada. 


Each f, factor source shows annual variations. The f, factors for Canadians emigrating to 
the United States are particularly high, indicating that 23% to 42% more adults emigrated to 
the U.S. than did children. This is not surprising, as the southern American states have always 
been attractive to retirees. Hence the f. factor based on U.S. data may not be suitable for 
estimating Canadian emigrants to countries other than the U.S. 

Similarly, the f, factors for interprovincial migration, based on the income tax file, suggest 
that adult migrants have exceeded child migrants by up to 10% from 1981 to 1986. However, 
the adult migrant group likely contains a high proportion of younger adults, who tend to move 
more often between provinces than other age groups. Hence this data source is also very specific 
and thus not suitable for computing the overall f/, factor. 

According to some authors (Beaujot and Rappak 1988), emigrant and immigrant flow data 
are associated, making it possible to compute an f, factor from the Emloyment and Immigra- 
tion Canada (EIC) immigration file. f. factors from the EIC immigration file are intermediate 
between those derived from interprovincial immigrant data and U.S. emigrant data. The figure 
based on the f, factor from the immigration file (275,762) is higher than the official estimate 
of emigrants (235,481), but is close to that derived from the 1986 Reverse Record Check study 
(288,376). If the official estimate of the number of emigrants were increased to 275,762, the 
1986 error of closure between the population estimate and census counts would be reduced 
from 0.95% to 0.79%. 

In sum, for the 1981-86 period the estimates of emigrants seemed to be improved by taking 
f. factors from the Canada Employment and Immigration (EIC) immigrant file rather than 
the Revenue Canada tax file. 

Yet in March 1989, it was discovered that emigrant estimates based on Family Allowance 
files and an f, factor derived from EIC immigration data were still too low after 1986. This 
seems to be a result of the high proportion (33%) of Canadian emigrants destined for the U.S. 
from 1981 to 1986, according to U.S. data. 

An analysis was also made of a method combining U.S. Department of Justice, Immigra- 
tion and Naturalization Service data on the numbers emigrating to the U.S. from Canada; child 
emigrant counts (ages 0-17) from Family Allowance files and an f, factor obtained from the 
EIC immigration file for all countries other than the U.S. For 1981 to 1986, the estimated 
number of emigrants by this method was 285,413. This revised estimate is much closer to the 
Reverse Record Check study figure (288,376). 
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Table 3 


Estimates of Net Interprovincial Migration from 1986 Census Data on Mobility, 
Family Allowance Files, Income Tax Files, and Residual Method, 
Canada, Provinces and Territories, 1981-1986 


; Family Income ; 

Geographic 1 

oe c eae etiowaice oo Sena? 

Files Files 

CANADA 0 0 0 — 238,178 
Nfld. — 16,550 — 14,837 — 15,051 — 26,111 
BiB. 1,540 293 Teil — 509 
INKS! 6,275 5,204 6,895 — 4,095 
N.B. — 1,370 — 2,239 — 65 —11,212 
Que. — 63,295 — 76,040 — 81,254 — 167,286 
Ont. 99,355 115,497 121,767 57,147 
Man. —1,555 — 3,700 — 2,634 — 8,180 
Sask. — 2,820 — 668 — 2,974 — 13,564 
Alta. — 27,665 — 34,073 — 31,676 — 50,811 
BiG? 9,500 13,289 7,382 — 12,418 
Yukon — 2,665 — 2,381 — 2,775 — 1,643 


N.W.T. — 755 — 345 — 366 504 
! Population 5 years of age and over. 
The residual method for estimating net interprovincial migration is: 


Net Migration = Growth of Census Population between time ¢ and t + 5 
— [(Births — Deaths) + (Immigration — Emigration)]. 


Source: Demography Division, Statistics Canada. 


3.2 Interprovincial Migration Data 


To test the accuracy of estimates of interprovincial migration obtained from the Revenue 
Canada tax file, two evaluations were conducted: (i) a comparison of sets of interprovincial 
migration data derived from the Revenue Canada tax files and Family Allowance files; and 
(ii) a comparison of the errors of closure of population estimates for two sets of internal migra- 
tion data. 

Table 3 presents net interprovincial migration estimates derived from four sources: 1986 
Census data on mobility; the Revenue Canada tax file; the Family Allowance file; and the 
residual-based net migration estimate. For all provinces, estimates of internal migration derived 
from the 1986 Census mobility data, the Revenue Canada tax file and Family Allowance files 
were consistent on the direction of net migration. All sources except the residual-based method 
show positive net migration for Prince Edward Island, Nova Scotia, Ontario and British 
Columbia. In other provinces, net migration was negative. 

The estimates of net interprovincial migration from Family Allowance files and Revenue 
Canada tax files are not strictly comparable to the residual method. By definition, the sum 
of net interprovincial migration in Canada, should be zero. However, the sum produced using 
the residual method is about 238,000. In addition, the differences between the residual-based 
and the Revenue Canada/Family Allowance-based net interprovincial migration estimates are 
very high in Newfoundland, New Brunswick, Quebec, Ontario and Alberta. 

The coefficient of variation (the ratio of the standard deviation of the average absolute error 
of closure for the provinces to the average absolute error of closure) was used to measure the 
relative accuracy of the internal migration estimates. The other estimates of the components 
of population change were assumed to be accurate. Statistically, a coefficient of variation of 
20% to 30% is normally acceptable. 
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Table 4 


Error of Closure Between Alternative Population Estimates and Census Counts 
by Province and Territory 1971, 1976, 1981 and 1986 


ee ————— 


Error of Closure! (%) 


Geographic Area 1971 1976 1981 1986 


Income FA Income FA Income FA Income FA 


Tax Tax Tax Tax 

Newfoundland —2.08 —1.64 0.49 1.34 1.63 2.30 1.97 2.01 
Prince Edward Island —2.09 -—2.01 Ou? 2.11  —0.05 1.02 0.99 0.63 
Nova Scotia -—1.68 —2.39 — 0.20 eee) 0.30 0.40 1.24 1.04 
New Brunswick —1.93 —-—2.65 —1.29 1.81 0.13 0.54 1.58 1.04 
Quebec —0.33 —0.97 —0.05 -—0.18  —0.30 —-—0.07 1.32 1.40 
Ontario 0.11 0.99 0.15 0.16 0.64 O37 en2 0.65 
Manitoba 0.29 0.38 — 0.27 0.39 1.07 0.87 0.51 0.41 
Saskatchewan 0.44 —0.33 0.45 0.37 —0.31 0.28 1.08 ESI 
Alberta —0.14 (We P —1.07 -1.11 -—2.39 -—2.64 0.73 0.63 
British Columbia 0.01 —1.34 0.28  —1.10 0.03  —0.07 0.59 0.79 
Yukon —5.36 —5.99 — 0.87 3.79 —1.98 2.06 -—4.78  —3.10 
Northwest Territories —2.12 2.64 —12.98 -—3.39 —7.08 0.43 —-1.44 -1.40 
Average Absolute Error 

10 provinces 0.91 1333 0.44 0.97 0.69 0.86 1.07 1.01 
Provinces and 

Territories 1.38 1.82 1252 1.41 i233 0.92 1.41 1.22 


Note: From 1976 to 1980, Revenue Canada data for children were available for age group 0-15 only. Therefore the 
J (j,«) factors were calculated using migrants aged 0-15 and 16+ instead of 0-17 and 18+. 


' Error of closure is calculated using the following equation: 


Estimate — Census count 
Error of closure = x 100 


Census count 


Income Tax: Revenue Canada Income Tax File. FA: Family Allowance File. 


Source: Estimates of interprovincial migration based on Family Allowance data, Demography Division, Statistics 
Canada. 
Estimates of interprovincial migration based on tax data, Small Area and Administrative Development 
Division, Statistics Canada. 


Table 5 
Coefficients of Variation of the Average Absolute Error of Closure between the Population 


Estimates and Census Counts among Provinces (” = 10), by Source of Interprovincial 
Migration Estimates, 1966-1971, 1971-1976, 1976-1981 and 1981-1986 


Period Sood AAE Standard Coefficient of 
(t,t + 5) (ti -+-5)) Deviation Variation (%) 
(1) (2) (3) = (2 + 1) x 100 

1966-1971 Income Tax 0.91 0.2863 31 

FA 153 0.2642 20 
1971-1976 Income Tax 0.44 0.1317 30 

FA 0.97 OA ISIS) Pip 
1976-1981 Income Tax 0.69 0.2463 36 

FA 0.86 0.2855 33 
1981-1986 Income Tax 1.07 0.1496 14 

FA 1.01 0.1570 16 
Note: AAE: Average absolute error of closure. 


Income Tax: Revenue Canada Income Tax File. 
FA: Family Allowance File. 
Source: Demography Division, Statistics Canada. 
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However, one could argue that the coefficient of variation is not a good indicator of the 
quality of internal migration data. For example, a set of estimates with an absolute error of 
closure of 10% for every province would give a coefficient of variation of zeros and conse- 
quently would be preferable to a set of estimates with closure errors ranging between — 1.0% 
and 1.0%. For cases like this, a quality measure that takes into account the size of the absolute 
error of closure as well as the standard deviation of absolute closure errors is clearly required. 
However, the likelihood of the provinces having the same absolute error of closure is extremely 
low (see Table 5), hence, the application of the coefficient of variation in this paper seemed 
to be valid. 

Table 5 shows the coefficient of variation (computed from figures in Table 4) for popula- 
tion estimates based on two sets of internal migration estimates and the census counts for 1971, 
1976, 1981 and 1986. Before 1976, the coefficients of variation for migration data from tax 
files were 50% higher for data from the Family Allowance file. This was expected, since the 
method for estimating migration from tax files was in the developmental stage. Futhermore, 
in estimating the number of interprovincial migrants, the J; factor (adult to child migration 
rates) was based on Census mobility data, an approach found to be less satisfactory than the 
current method. However, for 1976-1981 and 1981-1986, the gap in the coefficient of variation 
between the tax and Family Allowance migration data narrowed considerably. 

The tax-based migration data coefficient of variation was 9% higher in 1981 and 12% lower 
in 1986 than the coefficient of variation based on the Family Allowance file. Hence, the two 
sets of data are comparable, producing similar provincial estimates and errors of closure with 
the same level of variation among provinces. Since the coefficient of variation for each set is 
under 20%, they provide acceptable data on internal migration. 

In conclusion, estimates of interprovincial migration from the Revenue Canada tax files 
for 1981-1986 are consistent with estimates from the Family Allowance file. By province, they 
yield small variations in the errors of closure. 


4. CONCLUSION AND DISCUSSION 


The Family Allowance files and Revenue Canada tax files play important roles in providing 
consistent emigration and internal migration estimates for Canada, and for the provinces and 
territories. For 1981 to 1986, estimates of emigrants and interprovincial migrants obtained from 
these files are acceptable for estimating total population. 

Nationally the error of closure (the difference between the population estimates and census 
counts) for 1986 was higher than for the census years 1971, 1976 and 1981. In addition, the 
errors of closure by province in 1986 were positively biased, indicating that in all provinces 
the estimates were higher than census counts. 

These discrepancies are largely a result of differences in coverage of the 1981 Census popula- 
tion, which was used as the bench-mark, and coverage of the 1986 Census population. The 
Reverse Record Check estimate of the 1981 undercoverage rate for Canada was 2.01%. The 
estimate for the 1986 Census was considerably higher, 3.21%. 

Errors in the estimates of the other components of change may also partly account for the 
discrepancies. 
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Confidence Intervals for Postcensal Population 
Estimates: A Case Study for Local Areas 


DAVID A. SWANSON! 


ABSTRACT 


This paper presents a technique for developing appropriate confidence intervals around postcensal 
population estimates using a modification of the ratio-correlation method termed the rank-order pro- 
cedure. It is shown that the Wilcoxon test can be used to decide if a given ratio-correlation model is 
stable over time. If stability is indicated, then the confidence intervals associated with the data used 
in model construction are appropriate for postcensal estimates. If stability is not indicated, the con- 
fidence intervals associated with the data used in model construction are not appropriate, and, more- 
over, likely to overstate the precision of postcensal estimates. Given instability, it is shown that confidence 
intervals appropriate for postcensal estimates can be derived using the rank-order procedure. An 
empirical example is provided using county population estimates for Washington state. 


KEY WORDS: Population estimation; Confidence intervals; Ratio-correlation regression. 


1. INTRODUCTION 


A method of generating confidence intervals for postcensal estimates was not available until 
Espenshade and Tayman (1982) introduced a time-series regression estimation technique 
utilizing age-specific postcensal death rates. The Espenshade-Tayman technique represents an 
important breakthrough in estimation technology; however, like most breakthroughs it has 
limitations, of which two are notable: 


1. The technique is likely to be unsatisfactory at the subprovincial or substate level (Espen- 
shade and Tayman 1982); and 


2. Itisa major departure from the standard regression technique used in Canada and the United 
States for estimating county-equivalent populations, namely, ratio-correlation. This depar- 
ture is a particularly salient issue in terms of data requirements and the experience of people 
responsible for making county-equivalent and other subprovincial level population 
estimates. (Statistics Canada 1987). The term ‘‘county equivalent’’ is defined as a Census 
Division in Canada (Statistics Canada 1987) and as a county in nearly all U.S. states; notable 
exceptions in the U.S. include Alaska, in which county-equivalents are Census Areas, Loui- 
siana, where Parishes functions as counties, and Virginia, in which ‘‘independent cities’’ 
are included as county-equivalents. 

This paper presents a means of developing confidence intervals for postcensal county- 
equivalent populations using the rank-order procedure, a modification of the ratio-correlation 
method introduced by Swanson (1980) that exploits causal modeling concepts to take into 
account postcensal structural changes in a given ratio-correlation model. 

There are three issues relevant to the development of confidence intervals made using the 
ratio-correlation method. The first has to do with model stability over time. If the structure 
of associations among model variables is invariant over time, then the confidence intervals 
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constructed in regard to the model data set will apply to the population estimates generated 
by the model from the estimation data set. Although it has been consistently documented 
that it is not prudent to assume model invariance (D’Allesandro and Tayman 1980; Ericksen 
1973, 1974; Mandell and Tayman 1982; Namboodiri 1972; O’Hare 1976, 1980; Smith and 
Mandell 1984; Spar and Martin 1979; Swanson 1980; Swanson and Prevost 1986; Swanson 
and Tedrow 1984; Tayman and Schafer 1982; Verma ef a/. 1983), it would be useful to have 
a testing procedure for stability. This leads to the second issue, namely, the use of a statistical 
test. If the test indicates that stability can not be assumed, and yet confidence intervals 
associated with, say, a model constructed using 1960-70 data, are applied to estimates 
generated for, say, 1979, they are likely to overstate the level of precision in the 1979 estimates. 
Thus, the third issue is the need for a procedure that will generate appropriate confidence 
intervals. 

In the report that follows, a description of ratio-correlation is provided along with the 
modification that forms the basis for developing appropriate confidence intervals. Next, the 
logic for developing these confidence intervals is formally described, followed by an empirical 
example showing both the test for instability and the generation of both ‘‘inappropriate”’ 
and ‘‘appropriate’’ confidence intervals. 


2. METHODOLOGY FOR POPULATION ESTIMATION 


Ratio-correlation is a regression method designed to measure the temporal change in county- 
equivalent population proportions using observed temporal change in proportions of symp- 
tomatic indicators such as registered voters, covered employment and public school enroll- 
ment. The temporal change is measured by simply taking a ratio of proportions at two points 
in time. 

Since enumerated population numbers for all county-equivalents are available only from 
the federal census, a ratio-correlation regression model is always constructed using two points 
in time separated by a regular number of years. It is formally described as 


k 
geal aris MOG Reks 
j=l 
where 
ad, = the intercept term to be estimated 
b; = the regression coefficient to be estimated 
€ = the error term 
jJ = symptomatic indicator, (1 < / < k) 
i = county-equivalent (1 <7 < n) 
t = the year of the most recent census 
and 


Ra = Pit as Pit-z (1.A) 
ee S Bites 
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where 


Z = the number of years between each census 
P = Population 


S = Symptomatic Indicator 


Once a model is constructed, it is used to develop a postcensal estimate for time t + x by 
substituting (S; ;4,/¥ S;,.4,); into the numerator of the right-hand side of equation [1.B] 
while (S;/ ¥ S;,:); is substituted into the denominator of the right-hand side of equation 
[1.B]. This means that once Rize x 1S obtained, an actual population for area i at time = 
t + x is developed by introducing an independently estimated total population, P,,,, into 
equation [1.A] and algebraically solving equation [1.A] for P;,,,. Since } Py x does not 
usually equal the independently derived total, P,,,, an adjustment is made to force the 
summed population figures to the independently estimated total. 

One limitation of ratio-correlation is that its structure is invariant over time, which is why 
the rank order procedure was introduced by Swanson (1980). The rank-order procedure is based 
on the fact that information contained in the zero-order correlations found in an estimation 
data set can be exploited due to work by Land (1969, Chapter IV); work that is based on the 
fundamental theorem underlying path analysis as developed by Wright (1921). It involves a 
theoretical reversal of the dependent variable in the regression model, the population variable, 
as an unmeasured, causally prior variable and a just-identified structure - a minimum of three 
predictor variables (in the regression model), the covariance of which is assumed to be due to 
the fact that they are all causally related to the population variable. 


3. METHODOLOGY FOR CONFIDENCE INTERVALS ESTIMATION 


If the relationships found among the variables in the model data set remain stable over time 
(as shown through the rank-order procedure) then the same relationships should be found 
among the variables in the estimation data set. This stability would indicate that the S.E.E. 
associated with the model data set is appropriate for generating confidence intervals for the 
estimation data set. However, if stability does not exist, then the S.E.E. associated with the 
model data set is not appropriate, and may, in fact, generate confidence intervals that overstate 
the precision of postcensal estimates. These considerations lead to the question of determining 
stability through statistical inference. 

In answering the question just posed, consider that we are examining related pairs of 
variables. This implies that the Wilcoxon matched-pairs signed rank test could be used 
(Mosteller and Rourke 1973). In using this test, the null hypothesis is that there are no dif- 
ferences between the population estimates (scores) produced by the unmodified and modified 
regresion models. 

The key to developing confidence intervals for postcensal county equivalent population 
estimates is found in the fact that the rank-order procedure generates a set of regression coef- 
ficients for the estimation data set. From these coefficients, estimates of R? and the S.E.E. 
for the estimation data set can be developed, and the estimated S.E.E. leads directly to the 
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development of confidence intervals. First, recall that the coefficient of multiple determina- 
tion, R’, is simply the sum of the products of each zero-order correlation between an indepen- 
dent variable and the dependent variable, and the standardized regression coefficient for each 
independent variable (Hayes 1973), so that S.E.E. is (Hayes 1973) 


(n) (S?) (1 — R*)]1/2 
n—2 


S.E.E. = | 


where 


n = number of cases (county-equivalents) 
Se = variance of the dependent variable 


= coefficient of multiple determination 


ee 
| 


The formula for generating a confidence interval around a given estimated value for a point 
on a (population) regression line is provided by Kmenta (1971) 


Yj + (tn—2,0/2) (S.E.E.) 


An important point to realize is that the confidence interval is not directly generated for 
a population estimate, rather it is for the estimated ratio of proportions, or Rj , ,. However, 
as shown by Espenshade and Tayman (1982), a confidence interval around one variable can 
be translated for another variable algebraically substituted for the first. Thus, by finding the 
lower and upper confidence boundaries of Rj; , , these lower and upper confidence boun- 
daries can be translated into the population values: 


(Rit+x) = (n-2,0/2) S-E-E.) 


P; P; 
~ coor : sal + (tn~2,0/2) (S-E.E.) 
Vie tex Pi 


which leads to 
L.L. (Pirsx) - 
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4. EMPIRICAL STUDY 


Table 1.A in Swanson (1980) gives the zero-order correlations relating to a ratio-correlation 
model for estimating county civilian populations under sixty-five years from employment, 
voters, and grades 1-8 enrollment for the state of Washington, for the period 1950-1960. 
Characteristics of the model constructed from these data are given in Table 1.B. while Tables 
2.A and 2.B provide similar results for the 1960-1970 period as found in Swanson (1980). This 
latter set forms the estimation data over which the procedure will be described. 

Although full knowledge of the estimation data set is available, the procedure is used as 
if this were not the case. Of course, what is known in any estimation problem is the zero-order 
correlation matrix for the independent variables, which is used in conjunction with the fun- 
damental theorem of path analysis to estimate the coefficients for the modified model. Using 
the complete rank-order procedure, the modified model (Swanson 1980) is: 


Y = 0.046618 + 0.066786X, + 0.50727X, + 0.38736X3. 


Estimates for 1970 of the county civilian population under sixty-five years of age (adjusted 
to the independently estimated state total) resulting from the preceding modified model are 
presented in Table 1 along with the actual enumerated populations. 

The Wilcoxon test was conducted for the Washington data using the procedure in the SPSSx 
NPAR Tests command (SPSS 1986). To save space, the unmodified and modified population 
estimates are not presented. They can be found in Table 3 of Swanson (1980). Under the null 
hypothesis, the probability of obtaining Z = —3.2096 is 0.0013. Thus, the null hypothesis 
is rejected and it is assumed that instability exists for Washington counties in going from the 
model constructed using 1960/1950 data to the true unknown model associated with 1970/1960 
data. 

As a note of interest, the Chow test (Chow 1960) validated the results of the Wilcoxon test 
by showing that the difference between the ‘‘true’’ 1970-1960 ratio-correlation model and the 
1960/1950 ratio-correlation model was statistically significant. 

Had the results of the Wilcoxon test led us not to reject the null hypothesis, we would have 
used the unmodified coefficients from the 1960/1950 model data set to generate 1970 popula- 
tion estimates for Washington counties. Further, the S.E.E. for this same model (0.05022) 
would have been used to generate confidence intervals for the 1970 estimates. However, the 
results of the Wilcoxon test led us to reject the null hypothesis in this case. This indicates the 
modified coefficients developed using the rank-order procedure should be used in lieu of the 
unmodified model. Further, it indicates the need for a revised S.E.E., one that is not likely 
to overstate the precision of the 1970 estimates. 

Using the estimated values found in the 1970 example data for Washington state (Swanson 
1980) we find 


R? = (0.07533) (0.75290) + (0.47085) (0.92146) + (0.49481) (0.88082) = 0.926 
and 


2 — 
Serpe [eae eee 
39-2 


0.0599 
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Table 1 


90% Confidence Interval for the Estimated Civilian Population 
Under Sixty-Five Years by County, 
State of Washington 1970 


ee 


90% Confidence 


County ee ioa 18 27 Pa ea Bano oO Pn Interval (in 
percent) 
Adams 11102 10335 11458 12581 + 9.80 
Asotin 11862 10469 11814 13154 + 11.38 
Benton 63144 60405 67511 74616 + 10.53 
Chelan 35862 31733 36177 40620 i220 
Clallam 30023 28063 31294 34525 = a US 
Clark 116663 101183 111437 121690 + 9.20 
Columbia 3771 3683 4161 4639 +11.49 
Cowlitz 62586 55170 61581 67992 + 10.41 
Douglas 15287 14569 16252 17935 + 10.36 
Ferry 3336 2963 3397 3831 +312:78 
Franklin 23983 21960 24631 27302 + 10.84 
Garfield 2546 2447 2761 3075 reg) dea 
Grant 38921 37561 42606 47651 + 11.84 
Grays Harbor 52583 46294 52114 57935 cg 
Island 20589 20512 22148 24040 + I 39 
Jefferson 9235 8440 9473 10506 + 10.90 
King 1054271 935664 1037937 1140203 +1985 
Kitsap 86529 77022 85821 94619 + 10.25 
Kittitas 22764 17649 19863 22077 + blehS 
Klickitat 10729 10440 11923 13406 + 12.44 
Lewis 39265 35747 40122 44497 + 10.90 
Lincoln 8168 7939 9107 10275 + 12.83 
Mason 18411 16057 17827 19596 + 9.93 
Okanogan 22952 21002 23795 25688 + 10.97 
Pacific 13310 11270 12795 14320 +111 .92 
Pend Oreille 5185 5147 5893 6639 + 12.86 
Pierce 339048 314272 346728 379184 + 9.36 
San Juan 3089 2636 2918 3201 + 9.66 
Skagit 45703 43255 48758 54261 + 11.29 
Skamania 5330 4787 5358 5929 + 10.66 
Snohomish 245193 213164 231996 250827 + $8. F2 
Spokane 251057 223 he 256723 286072 +11.43 
Stevens 15178 13869 15780 17692 #12311 
Thurston 68719 63644 69540 75436 + 8.48 
Wahkiakum 3137 3033 3397 3761 +10.72 
Walla Walla 36608 33727 38271 42812 + 11.87 
Whatcom 72111 63218 70670 78122 + 10.54 
Whitman 34843 28960 32409 35858 + 10.64 


Yakima 128960 120347 136203 152219 + 11.69 
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Note, that from Table 2 in Swanson (1980), the actual R’ and S.E.E. values are 0.878 and 
0.05077, respectively. In comparison with the actual S.E.E. of 0.05077, the estimated S.E.E. 
is higher. This is appropriate given that we are more uncertain about the precision of estimates 
generated by the rank-order procedure than we would be about the precision associated with 
the ‘‘true’’ model, if in fact, the true model was obtainable. With the rank-order procedure, 
we can now generate a confidence band from the following formula: 


Y; = (£37 /2) (0.0599) 


In Table 1 an empirical example using a 90% confidence interval is given for the 1970 
estimated county population figures presented also in Table 1. Here, the 90% confidence 
interval is given by: 


Pii960 5 
ee (052053 R; + (1.69) (0.0599 
[ime] omanss [ton « 6.9 09) 


In examining the confidence intervals given in Table 1 in combination with the enumerated 
populations provided, it is found that in only one county (Kittitas) is the enumerated popula- 
tion outside of the 90% confidence interval. In this instance, the enumerated population exceeds 
the upper limit by 687 people. At a 90% level of confidence, the intervals are fairly wide, with 
a mean of 10.81, a minimum of + 7.39 percent for Island county and a maximum of + 12.83 
percent in Lincoln County. Compare these with the mean of the absolute percent errors 
associated with the 1970 estimates, which is 4.89 (Swanson 1980). This comparison suggests 
that the 90% level generates intervals that are too broad for practical use. Given this, it is of 
interest to consider which level of confidence would be more appropriate. It is also of interest 
to consider the effect of using the unmodified S.E.E. (0.05022) from the 1960/1950 model. 
We would expect that the confidence intervals generated by the unmodified model would be 
too optimistic. That is, at a given level of confidence, there would be fewer than expected 
counties for which the interval encompassed the actual population. To explore these issues, 
Table 2 was constructed. 

In Table 2, two distinct sets of information are provided. For both sets, however, a com- 
parison is made between the unmodified and modified estimates and their associated confidence 
intervals. In regard to the issue of expecting optimistic confidence intervals for the 1970 
estimates generated by the unmodified model, Table 2 indicates that at varying levels of con- 
fidence ranging from 90% down to 50%, the intervals are, indeed, optimistic in that for only 
two of the six levels examined are the expected number of county estimates within the specified 
level of precision. At the 80% level, for example, only 28 (72 percent) of the counties have 
enumerated 1970 populations within the confidence interval specified around the estimates; 
at the 60% level, only 22 (56%) of the counties have enumerated 1970 populations within the 
confidence interval specified around the estimates. 

The second aspect of Table 2 is the mean interval associated with a given level of confidence. 
At the 90% level, the mean of the intervals associated with the unmodified model is 9.10 per- 
cent; for the modified model it is 10.81 percent. At the 50% level, the means are 3.66% and 
4.35%, respectively. Thus, it is clear that the 60% and 50% levels of confidence generate a 
mean interval that is more in line with the mean absolute percent error, which is 4.88 for the 
modified model. 
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Table 2 


Number (%) of Counties in Which Actual 1970 Population 
was Inside the Confidence Interval 


ED 


Level of Unmodified Modified 
Confidence S.E.E. (0.05022) S.E.E. (0.0599) 
90% 35 (89.7%) 38 (97.4% ) 
80% 28 (71.8%) 33 (84.6% ) 
70% 24 (61.5%) 29 (80.6% ) 
66.66% 24 (61.5%) 26 (66.66%) 
60% 22 (56.4%) 23 (59.0% ) 
50% 20 (51.3%) 22 (56.4% ) 


Mean Interval (in percent) 


Unmodified Modified 
S.E.E. (0.05022) S.E.E. (0.0599) 
90% 9.10 10.81 
80% 102 8.38 
70% 5.66 6.75 
66.66% 5-59 6.40 
60% 4.59 5.47 
50% 3.66 4.35 


In examining the issue of confidence intervals, it appears that a procedure is needed for 
generating confidence intervals that are not misleading in terms of the precision of postcensal 
county-equivalent population estimates. However, guidance is also needed on selecting a given 
level of confidence that is appropriate for the estimates. Of interest in this regard is the work 
of Stoto (1983) on empirical confidence intervals for population projections. One of Stoto’s 
(1983:18) findings is the high and low population projections produced for the United States 
by the Bureau of the Census (1977) correspond to a 66.66% confidence interval. It may be 
the case that for county-equivalent postcensal populations, that the 66.66% confidence level 
is also appropriate, although in this test this level of confidence generates a mean interval of 
6.4 percent for the modified estimates, which is somewhat above their mean percent error (4.9). 
Another consideration is the length of time between the year for which a postcensal estimate 
is desired and the preceding census. In the example, the maximum period of postcensal time 
in the United States was used, 10 years. For each county, we have, in essence, a situation in 
which maximum uncertainty exists in regard to estimates. From this perspective, the relatively 
wide interval generated for each county at a 90 percent level of confidence is appropriate. We 
would expect that structural model changes occur relative to time. Hence, a narrower band 
would likely be generated in the first year following the end-census year of model construc- 
tion than in the second year; and so on through the intercensal period. 
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5. CONCLUSION 


At this point it should be clear that the rank-order procedure is not being presented as a 
fully-validated technique for constructing confidence intervals around postcensal county- 
equivalent population estimates. However, it appears to offer a reasonable starting point. Even 
with its limitations, the use of the Wilcoxon test and the confidence intervals developed using 
the rank-order procedure appears capable of providing benefits to those responsible for making 
such postcensal population estimates. In the first place, as noted by Espenshade and Tayman 
(1983), it is important to provide the users of postcensal population estimates some notion of 
their accuracy as do both the Wilcoxon test and the confidence intervals. Second, with the selec- 
tion of appropriate confidence intervals, a formal means is available for resolving disputes 
over the population of a given county-equivalent by using hypothesis testing procedures. Third, 
S.E.E. can be used as a basis for selecting one model over another. This means that a set of 
different ratio-correlation models could be considered for any given postcensal estimation year 
and, further, that a formal criterion is available for selecting one model over another. This 
feature could be useful in the event that the ratio-correlation estimates generated by a federal, 
provincial or state demographic center, are challenged in a given postcensal year, an event that 
has become more frequent, especially in the U.S. (D’Allesandro 1987). 
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SPECIAL OFFER 


Copies of the proceedings of recent Statistics Canada symposia are still available, and 
may be purchased at a nominal cost. These are: 


Symposium 87: Statistical Uses of Administrative Data. Two volumes, English and 
French. Regular price, each: $35. 

These are now available at $10 each or $12 for both languages. 

Symposium 88: The Impact of High Technology on Survey Taking. Bilingual, English 
and French. Regular price, each: $20. 


These are now available at $10. 


Cheques or money orders should be made payable to: 


‘““The Receiver General for Canada’’. 


Requests for these volumes, along with cheque or money order should be sent to: 


Production Manager 
Survey Methodology 
Statistics Canada 

4-C2 Jean Talon Building 
Tunney’s Pasture 
Ottawa, Ontario 

Canada K1A OT6 


GUIDELINES FOR MANUSCRIPTS 


Before having a manuscript typed for submission, please examine a recent issue (Vol. 10, 
No. 2 and onward) of Survey Methodology as a guide and note particularly the following 


points: 

1 Layout 

1.1 Manuscripts should be typed on white bond paper of standard size (8% x 11 inch), 
one side only, entirely double spaced with margins of at least 1% inches on all sides. 

1.2 The manuscripts should be divided into numbered sections with suitable verbal titles. 

1.3. The name and address of each author should be given as a footnote on the first page 
of the manuscript. 

1.4 Acknowledgements should appear at the end of the text. 

1.5 Any appendix should be placed after the acknowledgements but before the list of 
references. 

2. Abstract 
The manuscript should begin with an abstract consisting of one paragraph followed 
by three to six key words. Avoid mathematical expressions in the abstract. 

3. Style 

3.1 Avoid footnotes, abbreviations, and acronyms. 

3.2 Mathematical symbols will be italicized unless specified otherwise except for functional 
symbols such as “exp(-)” and “log(:)’, etc. 

3.3 Short formulae should be left in the text but everything in the text should fit in single 
spacing. Long and important equations should be separated from the text and numbered 
consecutively with arabic numerals on the right if they are to be referred to later. 

3.4 Write fractions in the text using a solidus. 

3.5 Distinguish between ambiguous characters, (e.g., w, w; 0, O, 0; 1, 1). 

3.6 Italics are used for emphasis. Indicate italics by underlining on the manuscript. 

4. Figures and Tables 

41 All figures and tables should be numbered consecutively with arabic numerals, with 
titles which are as nearly self explanatory as possible, at the bottom for figures and 
at the top for tables. 

4.2 They should be put on separate pages with an indication of their appropriate place- 
ment in the text. (Normally they should appear near where they are first referred to). 

>. References 

5.1 References in the text should be cited with authors’ names and the date of publication. 
If part of a reference is cited, indicate after the reference, e.g., Cochran (1977, p. 164). 

5.2 The list of references at the end of the manuscript should be arranged alphabetically 


and for the same author chronologically. Distinguish publications of the same author 
in the same year by attaching a, b, c to the year of publication. Journal titles should 
not be abbreviated. Follow the same format used in recent issues. 
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In This Issue 


In this issue’s special section, we take a look back and a look forward. Our contributors to this 
section are well-known survey statisticians who bring a wealth of experience and knowledge. By 
looking back with clarity to developments in our field, they enable us to look forward to areas 
of emerging interest. With one exception, each paper has discussants, with a reply by the authors. 

Rao and Bellhouse present an historical perspective on sample survey theory and methods. 
Beginning with a discussion of some of the earliest developments in the field, they then take us 
through the design-versus model-based debate, variance estimation methods, analysis of survey 
data and recent developments in computer software. The paper includes an extensive 
bibliography. Smith’s comments complement the paper, providing a somewhat different perspec- 
tive, including some thoughts on the position of sample survey theory relative to ‘“‘mainstream’’ 
statistics. 

Beginning with a discussion of the role of governments and social researchers in the earliest 
sample surveys and censuses, Fienberg and Tanur describe the institutional bases for survey 
research, particularly in the United States. Among the organizations considered are government 
agencies, statistical associations, polling firms and universities. The authors discuss recent 
developments including increased telephone interviewing and cognitive aspects of surveys. They 
end by discussing links among the various sectors which make up the field. In his discussion, 
Groves also looks at the sectors and states that movement of people among them has been less 
common than Fienberg and Tanur’s examples suggest. He also adds substantially to the list of 
recent developments. 

Whereas Fienberg and Tanur look at government institutions as one component out of several, 
Bailar focuses on the important role played by the U.S. Bureau of the Census in the development 
of sample survey methods. She discusses the motivation for, and development of, various 
methods and approaches including sampling and seasonal adjustment. The paper concludes with 
a look to the future. Brackstone emphasizes that practical problems gave rise to the advances 
discussed by Bailar. He also adds several other contributions made by Statistics Canada and other 
agencies to those mentioned by Bailar. Brackstone also points to the importance of a suitable 
environment to encourage innovation. 

Kish discusses alternatives to current periodic censuses. He rekindles the debate on the 
feasibility of replacing them by rolling censuses. He discusses the use of administrative data in 
this context, pointing out the existence of good sources of data in some countries. An important 
issue is how to cumulate data from rolling samples and censuses. Various alternatives are 
discussed. In his discussion, Scheuren points out that Kish is, in effect, advocating a major shift 
in our way of thinking - always a difficult task. While Scheuren feels that pure rolling censuses 
are likely to be too expensive, variations, along with the use of improved administrative data, 
should be feasible. Both authors agree that there is much research required for further progress. 

We are pleased to have Morris Hansen, who participated in many important developments 
mentioned by the authors, as a discussant of all the above papers. He adds important historical 
details and corrects some errors and misconceptions. One item of particular interest is Hansen’s 
discussion of the reluctance to introduce sampling - something which we now tend to take for 
granted. His insightful comments on individual topics are too numerous and varied to summarize 
here. 

Dalenius and Sarndal initially intended to discuss Bailar’s paper, but their paper metamor- 
phosed into a history of sampling techniques in Sweden. As such, it serves as a summary and 
update of Dalenius’s 1957 book. 


2 In This Issue 


The remaining papers in this issue of Survey Methodology deal with a diversity of topics. Kott 
proposes an unbiased estimator of variance for a two-phase sampling design where both phases 
are stratified simple random sampling. Such designs are commonly used, especially in agricultural 
surveys. 

Two-phase sampling with stratification at both phases is also the subject of White’s paper. 
An estimator due to Vardeman and Meeden which uses prior information is studied via simula- 
tion. Some theoretical results are also given for the case where the prior information is not used. 

Julien and Maranda describe the sample design used for the National Farm Survey since 1988. 
The efficiency of the new design is evaluated by comparing the precision of the survey estimates 
for 1988 to those for 1987, as well as to the expected precision obtained during the development 
of the new design. 

The results of a study in Saskatchewan are analyzed by Hay to examine the effects on responses 
of the method of data collection: self-administered questionnaire versus personal interview. 
Although statistically significant differences are found, they are not of sufficient magnitude to 
be of practical importance. 

Langlet studies the use of cluster analysis to deal with the problem of imputation for item 
nonresponse. This technique would be especially useful in situations where the number of imputa- 
tion classes is rather large. 

Béland and Théberge use randomization tests to compare two questionnaires which were used 
to study the questions likely to be asked in the 1991 census. Since tests of this type may not be 
familiar to many survey methodologists, this paper will serve as a useful introduction. 

In his paper, Cantwell derives a simple variance expression for a general composite estimator 
commonly considered for rotating designs. He deals with both single-level and multi-level rotation 
plans. 


The Editor 
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History and Development of the Theoretical 
Foundations of Survey Based 
Estimation and Analysis 


J.N.K. RAO and D.R. BELLHOUSE! 


ABSTRACT 


Early developments in sampling theory and methods largely concentrated on efficient sampling designs 
and associated estimation techniques for population totals or means. More recently, the theoretical foun- 
dations of survey based estimation have also been critically examined, and formal frameworks for inference 
on totals or means have emerged. During the past 10 years or so, rapid progress has also been made in 
the development of methods for the analysis of survey data that take account of the complexity of the 
sampling design. The scope of this paper is restricted to an overview and appraisal of some of these 
developments. 


KEY WORDS: Foundations of inference; Analysis of survey data; Computer software. 


1. SOME EARLY MILESTONES 


The motivation behind much of the work in survey sampling prior to the 1950’s or 60’s was 
the desire to obtain reasonably efficient estimates, at a desired cost, of totals, means, or pro- 
portions for large, and increasingly complex-structured, finite populations. A discussion of 
the early work in sampling human populations may be found in several review papers (see e.g., 
Hansen, Dalenius and Tepping 1985 and Bellhouse 1988). 

The history of the mathematical theory of survey sampling has its origins in the late nine- 
teenth century through the work of the Norwegian statistician A.N. Kiaer. Kiaer was the first 
to promote what was then called ‘the representative method’, or sampling, over complete 
enumeration. What Kiaer (1897) meant by representative sampling was that the sample should 
mirror the parent finite population. This can be achieved in two ways, by randomization or 
by balanced sampling through purposive selection. Initially, purposive selection was the 
preferred method of sample selection, but gradually randomization became a strong compet- 
itor to balanced sampling for sample selection. By the 1920’s random sampling and purposive 
selection were both widely used as sample selection techniques. The major theoretical 
developments in both areas which occurred during this era are summarized in Bowley (1926). 
This summary includes the development of stratified random sampling with proportional alloca- 
tion and the derivation of formulae to obtain the precision of an estimate from a purposively 
selected sample. 

The equal footing of random sampling and purposive selection gradually changed after the 
publication of Neyman’s (1934) classic paper. Neyman was able to show, both theoretically 
and with practical examples, why random sampling was preferable to purposive selection for 
the large-scale sampling problems of the day. With the publication of the 1934 paper, Neyman 
also opened up new avenues of development for random sample selection techniques. 
Previously, Bowley and his followers used only sampling designs with equal inclusion 
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probabilities for every population unit. Their reasoning was that this method of sampling would 
provide a representative sample of the universe. Neyman (1934) broke out of this sampling 
straitjacket with his theories of stratified sampling with ‘‘optimal’’ allocation and cluster 
sampling with ratio estimation. In both situations, ‘‘valid’’ estimates of population totals, 
means or proportions are obtained without reliance on a representative sample selected through 
a design with equal inclusion probabilities. Neyman’s final contribution to the theory of survey 
sampling is his introduction of cost functions to find the sample allocation in two phase 
sampling which minimized the variance subject to a fixed budget (Neyman 1938). 


Neyman’s fundamental contributions inspired various important extensions of his theory. 
Among these, we should mention ratio and regression estimation with two-phase sampling 
(Cochran 1939), determination of ‘‘optimal’’ stratification points and ‘‘optimal’’ allocation 
with multiple parameters/characters (Dalenius 1957), and sampling on two occasions with 
partial replacement of units (Jessen 1942) which was subsequently extended by Patterson 
(1950) and Hansen et al. (1953, pp. 470-503) to sampling on more than two occasions (also 
called rotation sampling). Rotation sampling and associated ‘‘composite’’ estimates are now 
extensively used to estimate levels and changes from continuing large scale, multi-purpose 
surveys (e.g., the Current Population Survey (CPS) carried out by the U.S. Bureau of the 
Census). 

Neyman’s work also greatly influenced Morris Hansen, William Hurwitz, and their col- 
leagues at the U.S. Bureau of the Census. Inspired by their practical problems in large-scale 
survey design and by Neyman’s approach to sampling theory, Hansen and Hurwitz (1943) 
developed the theory of sampling with probability proportional to size and with replacement 
(also called PPS sampling). The effect of this approach to multistage surveys is that it provides 
approximately equal interviewer work loads which makes the administration of a multistage 
survey easier. This procedure also leads to significant reductions in the variances of the 
estimates, by controlling the variability arising from unequal cluster sizes without actually 
stratifying by size and thus allowing stratification on other variables to reduce variance. The 
theory of Hansen and Hurwitz was extended by Horvitz and Thompson (1952) and Narain 
(1951) to unequal probability sampling without replacement. By making the inclusion proba- 
bilities of units at each stage proportional to their sizes, the desirable features of the Hansen- 
Hurwitz method are retained, using the so-called Horvitz-Thompson estimator of a population 
total. The basic work of Horvitz and Thompson and Narain stimulated many theoretical and 
applied contributions to unequal probability sampling without replacement. Brewer and 
Hanif (1983) and Chaudhuri and Vos (1988) have provided comprehensive accounts of these 
developments. 

Madow and Madow (1944) have given the basic theory of systematic sampling, and 
introduced population models to examine the features of systematic sampling. Cochran (1946) 
introduced the ‘‘superpopulation’’ approach in which the finite population is regarded as being 
drawn from an infinite superpopulation having certain properties. The expected (or anticipated) 
variances under the superpopulation model are then compared to study the relative efficiency 
of alternative sampling strategies. His 1946 paper stimulated much subsequent research in the 
use of superpopulation models in the choice of sampling strategies and also for model-dependent 
or model-assisted inference (see Section 2). 

Mahalanobis (1946) developed the technique of interpenetrating subsamples, and used it 
extensively in large-scale surveys in India for assessing both sampling and non-sampling errors. 
This technique consists of drawing the sample in the form of two or more independent sub- 
samples according to the same sampling scheme such that each subsample provides a valid 
estimate of the parameter of interest. By assigning the subsamples to different interviewers 
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(or interviewer teams), a valid estimate of the total variance can be obtained that takes proper 
account of the correlated response variance component due to interviewers. Deming (1960) 
used this method (sometimes called replicated sampling) extensively to obtain simple estimates 
of variance. It has led to resampling techniques such as the jackknife, balanced repeated repli- 
cation and the bootstrap for getting variance estimates of complex non-linear statistics (see 
Section 3). 

Yet another milestone in the emergence of ideas and theory surrounding complex surveys 
is the concept of design effect (DEFF), due to Leslie Kish (see Kish 1965, sec. 8.2). The design 
effect is defined as the ratio of the actual variance of a statistic under the specified design to 
the variance which would be achieved under a simple random sample of the same size. The 
concept of design effect has been found to be especially useful in the presentation and modelling 
of sampling errors, and also in the analysis of survey data involving clustering and stratifica- 
tion (see Section 4). 


2. THEORETICAL FOUNDATIONS 


Although Neyman (1934) and others obtained best linear unbiased estimators for simple 
designs using the standard Gauss-Markov set-up, the development of traditional sampling 
theory progressed more or less inductively. Estimators (and designs) which appeared reasonable 
were considered and their relative properties carefully studied by analytical and/or empirical 
methods, mainly through comparisons of bias and mean square error, and sometimes also 
using anticipated mean square error or variance under plausible superpopulation models. As 
noted by Hansen et al. (1983), unbiasedness of estimators under a given design was not insisted 
on since it ‘‘often results in much larger mean square errors than necessary’’. Instead, asymp- 
totic design consistency of estimators was insisted on, at least when aggregate estimates from 
reasonably large samples are needed, and the mean square errors of selected asymptotically 
design consistent estimators were compared to arrive at a suitable estimator (and design). 
Moreover, in large-scale surveys involving a great many statistics, uniform estimation pro- 
cedures are often insisted on at the expense of variance inflation for some statistics (compared 
to alternative estimators tailored to each statistic), due to time, cost and other operational 
constraints. 

Despite the usefulness of the traditional approach, the need for a formal framework for 
inference from survey data was long felt. Realizing this need, several statisticians have made 
important contributions to the theoretical foundations of inference from survey data, especially 
during the past 10-20 years. Several review papers (see e.g., Chaudhuri 1988) and two books 
(Cassel et al., 1977; Chaudhuri and Vos 1988) discuss various aspects of the theoretical 
foundations. 

Most papers on the theoretical foundations of sampling theory have assumed the following 
somewhat idealistic set-up. A survey population U consists of N distinct elements identified 
through the labels j = 1, ..., N. The characteristic of interest y; (possibly vector-valued) 
associated with element j can be known exactly by observing element j. Thus response or 
measurement errors are assumed to be absent or ignored if present. The parameter of interest 
is the population total Y = y; + ... + yy or the population mean Wome VIL Nails 
known). A sample is a subset s of U and the associated y-values, i.e., { (i,¥;), 1 € 5}, selected 
according to a sampling plan which assigns a known probability p(s) tossuchthatp(s) = 0 
for all s € S (the set of all possible s) and Y,<¢sp(s) = 1. The selection probability p(s) can 
depend on known design variables z = (z), ..., Zy)’, Such as stratum indicator variables 
and size measures of clusters, i.e., p(s) = p(s | z) where z; is possibly vector-valued. For 
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probability sampling, the inclusion probabilities 7; = Y s:j¢s;p(s) are positive, which 
permits unbiased or consistent estimation of Y in the traditional sense. It is also customary 
to impose the condition that the joint inclusion probabilities 7; = Y (s:(i,;)es;p(s) be 
positive, which permits unbiased or consistent variance estimation in the traditional sense. 

The basic problem is to make inferences (estimation, variance estimation and constructing 
confidence intervals), about the total Y by observing a sample selected according to a specified 
sampling plan p(s) and also using available supplementary data. This involves essentially three 
steps: (i) choice of a sampling plan; (ii) choice of an estimator Y; (iii) choice of a variance 
estimator and confidence intervals. There are essentially three different approaches to imple- 
ment these steps: (i) design-based approach, also called probability sampling approach or ran- 
domization approach; (ii) model-dependent approach, also called prediction approach or 
probability speculation approach (Hajek 1981), (iii) a hybrid approach, called model-based 
approach or model-assisted approach. Developments to date under each of these three 
approaches are discussed below. 


2.1 Design-based Approach 


This approach uses probability sampling both for sample selection and for inference from the 
data. The probability sampling distribution provides valid inferences irrespective of the popula- 
tion y-values, even in complicated situations, in the sense that the pivotalt = (Y — Y)/s(Y) 
is approximately N(0,1), at least for large samples, where s( Y) is the standard error of Y. This 
approach has been critized on the grounds that such inferences, although assumption-free, refer 
to repeated sampling from the survey population involving all samples s € S and the associated 
probabilities p(s), instead of just the particular s that has been drawn. This criticism can be 
countered to some extent by using either conditional design-based inference referring to a subset 
of S that is ‘‘relevant’’ to the particular s or by a model-assisted approach. 

Horvitz and Thompson (1952) made a basic contribution to foundational aspects of design- 
based inference by formulating three classes of linear estimators of Y, and then raising the 
possibility that the best (minimum variance) estimator among all possible linear unbiased 
estimators of Y may not exist, even for simple random sampling. Prompted by the Horvitz- 
Thompson formulation, Godambe (1955) proposed a general class of linear estimators given 
by Y, = Yiesdsi¥;, where the weight b,; is attached to element i if s is selected and i € s. He 
proved that no best unbiased estimator of Y could exist in this class, for any sampling plan 
p(s). Since the criterion of minimum variance had failed, several alternative criteria for the 
choice of an estimator were proposed. Among these, the admissibility criterion is of some use 
but is not sufficiently selective in distinguishing between the merits of estimators since too many 
estimators are admissible. Ghosh (1987) provides an excellent survey of results on admissibility 
and related criteria in finite population sampling. New criteria that give rise to a unique choice 
of estimator in the Godambe class for any sampling plan have also been put forth, but the 
optimality properties established have questionable relevance (see Rao 1971, Rao and Singh 
1973). Basu’s (1971) well-known ‘‘elephants’’ example demonstrates the futility of two such 
criteria, viz. necessary bestness and hyperadmissibility. 

Godambe (1966) obtained the likelihood function from the sample { (i,y;), i € s} regarding 
the N-vector y = ();, ..., ¥x)’ as the parameter of interest, but it provides no information 
on ();: 1 ¢ s), and hence on the total Y, since the N population units are essentially treated 
as N separate post strata. A way out of this difficulty is to ignore some of the data to make 
the sample non-unique and arrive at an informative likelihood function (Hartley and Rao 1968; 
Royall 1968). Another route is to combine the uninformative likelihood function with ex- 
changeable priors via Bayes theorem to arrive at informative posterior inferences (Ericson 1969). 
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Conditional inference has attracted considerable attention (and controversy) in classical 
statistics since Fisher (1925). The choice of a relevant reference set for making conditional 
inference is not always clear-cut, but in the context of post-stratification it seems sensible to 
make design-based inferences conditional on the realized strata sample sizes (Durbin 1969). 
Holt and Smith (1979) provide the most compelling arguments in favour of conditional design- 
based inference, although their discussion was confined to post-stratification of a simple 
random sample. Rao (1985) considered a number of real examples involving random sample 
sizes to illustrate conditional design-based inference and associated difficulties. 

Robinson (1987) considered conditional design-based inference from a simple random 
sample when only the population total _X of a concomitant variable x is known. By conditioning 
on the observed sample mean x, he showed that the usual ratio estimator Y, = (/X)X is con- 
ditionally biased. He obtained a conditional bias adjusted ratio estimator given by 


Y,(adj) = Y, + N(r — b)(x — X)X/X, (2.1) 


where r = j/xX and b is the sample regression coefficient. He also showed that a customary 
variance estimator 


se(¥r) SEN SCI Nia) lisa t a) /ZLUTE oe) (2.2) 


| i€s 
is conditionally biased, sake another classical variance estimator 
7 (Xie CX/X)FsaXs) (2.3) 


is in fact conditignally unbiased, for large n. Robinson also showed, through a simulation study, 
that s2(Y,) is very close to the estimator of conditional variance of Y,(ad/). 


2.2 Model-dependent Approach 


A strict model-dependent approach involves purposive sampling, and the model distribu- 
tion (generated from hypothetical realizations of y = (), ..-, ¥x)’ obeying the model) pro- 
vides valid inferences referring to the particular sample s that has been drawn. 

The model-dependent approach was first proposed by Brewer (1963) and extensively studied 
by Royall and his co-workers, starting with Royall (1970). It is best illustrated under a simple 
regression model 


Ee (yj) tasBxpeets the cg NeiaBer Oya 200 (2.4) 


where E,,, denotes the model expectation. It is further assumed that the model variance 
Vin(¥;) = 0? where o? is known except for a multiplicative constant, and that the model 
covariance cOV,,(y;, ¥;) = 0, i # j. Royall (1970) showed that the customary design- 
unbiased estimator, Ny, under simple random sampling is biased under the model given by 
(2.4), and that Nj leads to serious underestimation if the observed sample contains mostly units 
with small sizes, x;. These results can also be shown under the conditional design-based 
approach without assuming a model (Rao 1985). 

The best linear model unbiased estimator (or prediction estimator) of Y under the model 
(2.4) is given by 


P= Yoh Ly bx (2.5) 
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which reduces to the usual ratio estimator Y,if o? = o?x;, wheres = U — sis the set of non- 
sampled units and £ is the best linear unbiased estimator of @. The uncertainty in Y is measured 
by E,,(Y¥Y — Y)* = V,,(¥ — Y) which in the case of Y, reduces to 


Vin(Y — Y) = {X(X — n&)/(n®)}o?. (2.6) 


Since (2.6) decreases as X increases, the optimal design is a purposive sample consisting of the 
n units whose x-values are largest, assuming that the population x;’s are known. A model 
unbiased estimator, s?,(Y — Y), of V,,(¥ — Y) is obtained from (2.6) by replacing o? with 
its weighted least squares estimator 67, and the resulting pivotal ¢,, = (Y — Y)/S,_(Y — Y) 
is approximately N(0,1) under the model distribution. These theoretical results are impressive, 
but such model-dependent strategies could lead to serious biases if the assumed model is not 
completely correct. 

To protect against model misspecifications, Royall and Herson (1973 a,b) considered model 
deviations consisting of second or higher order polynomial terms in x (say g-th order) or an 
intercept or both, and demonstrated that a balanced sample for which x4) = XY, j = 1, 

.., q provides robustness in the sense that Y, remains model unbiased, where 4) = Y i, 
x//nand XY) = Y;.yx//N. Further, they have shown that stratification on x with optimal 
allocation and balanced sampling within each stratum together with the separate ratio estimator 
of Y provides increased efficiency. Purposively chosen balanced samples have a number of 
difficulties, nevertheless. First, due to lack of rigorous rules in the sample selection one might 
be tempted to select units whose x; are close to X (in the case of g = 1) which can produce 
an unrepresentative sample if y is positively correlated with x (Yates 1960, p. 40). Second, 
balancing is sensitive to departures from the polynomial regression model (Madow 1978, 
p. 320). Balance is required on the alternative model, which may contain higher-order poly- 
nomial terms or other variables or both, and the extra variables in the alternative model must 
be known in advance. Third, balanced sampling is not feasible for surveys with multiple 
characters of interest since different samples may be required for each variable. 

If the extra concomitant variables z in the model are unknown or unmeasured, Royall and 
Pfeffermann (1982) recommend simple random sampling since it provides ‘‘grounds for con- 
fidence that the selected sample is not badly unbalanced on z’’, but more recently Royall and 
Cumberland (1988) seem to favour some form of restricted randomization: ‘‘Many techniques, 
including restricted randomization, stratification and systematic sampling, can be used to help 
achieve balanced samples. We are not advocating one scheme over another; ...”’. In any case, 
it appears that most advocates of the model-dependent approach seem to recommend pro- 
bability sampling in some form, as noted by Smith (1984), and hence the main difference 
between the probability sampling approach and the model-dependent approach is in the choice 
of the pivotal involving the estimator Y and a measure of its uncertainty. 

Despite the above-mentioned limitations, the model-dependent approach is useful for 
studying the conditional performances of conventional procedures, under different plausible 
models. For instance, the variance estimator s?(Y,) is consistent with the behaviour of the 
conditional variance V,,(Y, — Y) under the model (2.4) with 0? = o°x;, while s2(Y,) is 
model-biased (Royall and Eberhardt 1975). The variance estimator s2(Y,) is also robust to 
deviations from the assumption 0? = o°x;. 


2.3 Model-assisted Approach 


Hansen, Madow and Tepping (1983) illustrated the dangers in using model-dependent 
strategies even when the model is apparently consistent with the sample data. By introducing 
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a misspecification to the model (2.4) which is not detectable through tests of significance from 
samples as large as 400, they showed that the design-based coverage of the confidence intervals 
derived from the model-dependent pivotal ¢, = (Y, — Y)/s,(Y¥,) is substantially less than 
the desired level and that it becomes worse as the sample size increases. The poor performance 
of t, was due to the asymptotic inconsistency of the estimator Y, with respect to their stratified 
random sampling design. 

The model-assisted approach considers only asymptotically design consistent estimators Y 
that are also model unbiased under an assumed model. Variance estimators that are consis- 
tent for the design variance of Y and at the same time model unbiased (at least approximately) 
for the conditional variance V,,,(Y — Y) are also constructed. Thus the resulting pivotal leads 
to valid inferences under an assumed model and at the same time protects against model 
misspecifications in the sense of providing valid design-based inferences irrespective of the 
population y-values. However, very little attention has been given to studying conditional 
design-based properties of model-assisted strategies under mong misspecifications. 

Godambe (1955) assumed the model (2.4) with V,,,(y;) = 07 and COVm(Vi¥j) = 0,1 FJ, 
and obtained a lower bound, Y j;<y(1/z; — 1)o?, to the anticipated variance of any design 
unbiased linear estimator, Y,. He also showed that any fixed sample size plan with 7; = 
(nx;) /X together with the Horvitz-Thompson estimator, Yar = ¥ jes¥;/7;5 pins. the lower 
bound, provided 07 = ox}. “‘Optimal’’ design unbiased strategies do not exist if 07 # o2x?, 
and as a result asymptotically optimal strategies were developed by relaxing the restriction to 
design unbiased estimators and considering asymptotically design-consistent estimators. The 
generalized regression estimator 


Y reg = =)y Yi/T; an a(x - We u/m) (2.7) 


i€s i€s 


for any fixed sample size plan with z; proportional to g; is asymptotically optimal (i.e., 
the asymptotic anticipated variance attains the lower bound), where @ is a linear model un- 
biased estimator of @ and EmEy(B — B)? — Oasn — o, where E, denotes the design expec- 
tation (rete 1980). In parecule the best model Batiaced Winator fo OS aoa 
(Lies Wi x?) with w; = = 1/0? may be chosen. 

If B = (Dies WiXidi/0;)/(L ies WiX?/1;) With w; = 1/x; is chosen, then Y eg reduces to the 
simpler form (ratio estimator) 


Vreg = XB = Y) Bsivi/ aj, (2.8) 


i€s 


where g,; = X/(¥ i¢sX;/m;) and g,; converges in probability to 1 as n — oo (Sarndal and 
Wright 1984). Sarndal, Swensson and Wretman (1989) proposed a new variance estimator for 
estimators Y of the form (2.8) which is design consistent and at the same time approximately 
unbiased for the conditional variance V,,,(Y — Y). Their variance estimator for 1 is given by 


Serer) = Dy Cre Ree Ti) Mi | (BiG _ Bj6))° (2.9) 


i<jés 


where é = (y; — 8x;)/7;. For simple random sampling, s?(Y,2¢) reduces to s2(Y,), given 
by (2.3), which was justified under the prediction and conditional randomization approaches. 
Kott (1987) proposed a ratio adjustment to the conventional Yates-Grundy variance estimator, 
s¥cg(Y), of any model unbiased asymptotically design consistent estimator Y. His variance 
estimator 
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SG(Y) = stg(Y)1Vn(Y — Y)/Ems¥c(Y)] (2.10) 


is model unbiased and at the same time asymptotically design consistent. However, for 
estimators of the form (2.8) Sarndal ef a/. variance estimator appears simpler since it is obtained 
simply from the conventional variance estimator s¥c(Y) by changing é; to g,; é. 

The conventional regression estimator is obtained by first considering a fixed constant B 
in place of @ in (2.7), and then substituting a consistent estimator of B,,,, the value of B 
minimizing the design variance. This estimator does not depend on the validity of any model. 
However, the optimal design variance can be approximately attained in the model- assisted 
framework by modifying the model (2.4) to E(y;) = Bx; + yz; and then using (B,7)’, the 
weighted regression estimator of (8,7)’ with weights w; = 1/ a?. The resulting estimator of 
Y reduces to (2.7) with 8 changed to 8 (Isaki and Fuller 1982; Montanari 1987). Any other 
choice of 8 in (2.7) will give a larger asymptotic design variance. 

Little (1983) argued that only models that yield asymptotically design consistent, best linear 
model unbiased estimators should be used since the latter estimators are optimal if the model 
is in fact true. One way to accomplish this is by introducing an additional auxiliary variable 
u; = o7(1 — 2;)/7; into the model (2.4), i.e. by using E(y;) = Bx; + yu; (Sarndal and 
Wright 1984). If we change the model to E(y;) = Bx; + yor/n; + 507 by adding two 
auxiliary variables o?/; and o7 to the model (2.4), then we get an asymptotically design 
consistent, best linear model unbiased estimator of the form Y = JY jes 85:;/m; (Sarndal and 
Wright 1984). The lower bound to asymptotic anticipated variance is also attained if we choose 
a sampling plan with z; proportional to o;. The above desirable properties, however, are 
obtained at the expense of a slight increase in the model variance under the original model (2.4). 

Godambe and Thompson (1986) employed the theory of estimating functions to derive design 
consistent estimators through an assumed model. For example, if y; is expected to be unrelated 
to m; for some character y in a multisubject survey, then the ‘‘optimal’’ estimating function 
gives the Hajek (1971) estimator of Y: 


Vy = ( y) vir) | > 1/ni) (2.11) 


i€s i€s 


The superpopulation model here is given by y; = @ + ¢;, with independent errors e;, which 
reflects the situation at hand. The estimator Yj, avoids the difficulties associated with the 
Horvitz-Thompson estimator Y7/N, as illustrated by the ‘‘elephants’’ example of Basu 
(1971). The method of estimating functions looks promising, but further work remains to be 
done on its use in getting ‘‘better’’ estimators or pivotals or both. It is interesting to note that 
the well-known Fieller method of computing confidence limits for a ratio (Fieller 1932) and 
the method of Woodruff (1952) for computing confidence limits for medians are essentially 
equivalent to the method of estimating functions. 

The results in Sections 2.2 and 2.3 use models appropriate to unistage sampling. In the case 
of multistage sampling, the models are more complex due to intra-cluster correlations (Scott 
and Smith 1969; Montanari 1987). The resulting best linear model unbiased estimators or predic- 
tion estimators involve weighted combinations of estimators, where the weights depend on intra- 
cluster correlations which can be estimated from the sample data. Bellhouse and Rao (1986) 
investigated the relative efficiency of such estimators, under the repeated sampling framework. 
Their empirical results suggest that the prediction estimators may not be significantly more 
efficient than the customary estimator in two-stage sampling with PPS sampling of clusters 
and simple random sampling within sampled clusters. 
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If the clusters are regarded as strata and if the strata means are the parameters of interest 
as in small area estimation, then the prediction estimators of strata means are likely to be 
significantly more efficient than the customary design-based estimators since the prediction 
estimators ‘‘borrow strength’’ from all the strata unlike the customary estimators. In the case 
of two-stage sampling with cluster means as parameters of interest, only a prediction estimator 
for the nonsampled clusters can be implemented. 


3. VARIANCE ESTIMATION AND CONFIDENCE INTERVALS 


3.1 Linear Statistics 


A substantial part of traditional sampling theory is devoted to the derivation of mean 
Square errors or variances of linear estimators of a total Y, and their estimators. Rao (1979) 
developed a unified approach for estimators belonging to Godambe’s general linear class, 
Y, = Yiesbis¥;, which enables the derivation of mean square error in a straightforward 
fashion, and also exhibits the necessary form of any non-negative quadratic unbiased estimator 
of the mean square error. For multistage designs, a general estimator of Y is of the form 
Yim = Lies dis Y;, where s now denotes a sample of primary sampling units (psu’s) and Y; is 
an unbiased linear estimator of psu total Y; based on subsampling the psu. Unified variance 
formulae for multistage designs have been worked out by Raj (1966) and Rao (1975). 

Large scale surveys often employ many strata, L, with relatively few psu’s n,, sampled 
within each stratum h. In fact, it is a common practice to select n, = 2 psu’s within each 
stratum to permit maximum degree of stratification of psu’s consistent with the provision of 
a valid variance estimator. If the psu’s are sampled with replacement with probabilities p,; in 
stratum hf, then the estimator of total Y is given by Y = Y,7,, and an unbiased variance 
estimator is simply obtained as 


s(Y)= 0 (Tai — r)* [onto = mi}, (3.1) 


h i 


where F, = Yilni/Nn sai = Yni/Pni and Y;,; is an unbiased estimator of the i-th psu total in 
stratum h(i = 1, ..., m,; h= 1, ..., L). This stratified design is frequently used in com- 
paring methods for nonlinear statistics (Section 3.2). Because of its simplicity, s*( Y) is often 
used even when the psu’s are sampled without replacement. This procedure leads to overestima- 
tion of variance, but the relative bias would be small if the first stage sampling fraction is small. 


3.2 Non-linear Statistics 


Many non-linear, finite population parameters of interest, @, such as ratio, regression 
and correlation coefficients, can be expressed as smooth functions, g(Y) of totals Y = 
(Y;, ..., Y,)’ of suitably defined variates such that g(Y) « g,(Y;/M, ..., teil!) 
where Y, = M, the population size. The parameter 0 is estimated by g(Y) « g,(Y;/M, ..., 
yy ,/M ). Such estimators are well-behaved even when the variates attached to the elements 
t are not related to the inclusion probabilities 7,(t = 1, ..., M) since g(Y) is a function only 
of the Hajek-type estimators ¥; = Y;/M of the means Y;. As an example of g(¥Y), the esti- 
mator of a finite population regression coefficient B = ¥ (x, — X)(y, — Y)/¥ (x, — X)? 
can be written as 


B = [(Z/M — (X/M)(Y/M)|[W/M — (X/M)2]—), (3.2) 
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where X, Z and W are the estimators of the totals _X, Z and W of the variates x,, Z) = YX; 
and w, = x? respectively. 

Variance estimation methods for non-linear statistics, g(¥), include the well-known 
linearization method and resampling techniques like the jackknife, balanced repeated replica- 
tion (BRR) and the bootstrap. The linearization method is applicable to general sampling 
designs, but it involves a separate variance formula for each statistic. On the other hand, 
resampling methods use a single variance formula for all statistics. The jackknife and BRR, 
however, are strictly applicable only to those designs in which the psu’s are sampled with replace- 
ment (or the first-stage sampling fractions are negligible). The bootstrap seems to be more gen- 
erally applicable, but it is computationally more cumbersome and its properties have not yet 
been fully examined. 


Linearization method 


If we denote the variance estimator of Y = Y(y,) for a general design as v(y,), the 
linearization method provides a variance estimator for a nonlinear statistic 6 as v(z,) fora 
suitably defined synthetic variable z, which depends on the form of 6. For a general statistic 
6 = g(Y), the variance estimator is given by 


si (6) = v(z,) with z = )) yig(¥), (3.3) 


(Woodruff 1971), where y,; is the value of ith character for ¢th unit, and g;(Y) is the partial 
derivative dg(Y)/dY; evaluated at Y = Y(i = 1, ..., q). One drawback of the formula 
(3.3) is that the evaluation of partial derivatives may be difficult in some cases, although useful 
approximations to the desired partial derivatives can be obtained using numerical methods 
(Woodruff and Causey 1976). The variance estimator can also be obtained in many cases, 
without actually evaluating the partial derivatives g;, by recasting 6 as a ratio-type statistic and 
using the usual variance formula for a ratio. For example, the sample regression coefficient B 
may be expressed as B = Y(z1,)/Y (zy) with zy, = (% — Y) (x, —X) and zy = (% — x). 
so that 


s3(B) = v(Zy — Beoy)/LY¥ (Zo) 1*- (3.4) 


Similar techniques can be used for other statistics like the multiple regression coefficients (Fuller 
1975; Folsom 1974). Binder (1983) extended the scope of linearization method to statistics 
defined implicitly as the solution of a set of nonlinear equations. His formulation covers finite 
population parameters derived from generalized linear models which include the linear regres- 
sion model and the logistic regression model. 


Resampling methods 


We now turn to resampling methods for the commonly used stratified multistage design 
of Section 3.1. Letting 6” be the estimator of 6 computed from the sample {r,;} after omit- 
ting r,; = Y);/Ppi, a jackknife variance estimator of 6 = g( )'r;,) is given by 


sj(8) = J) {mm — 1)/m} YO" — 8)”. (3.5) 
h i 


Several variations of (3.5) can be obtained; for instance, 6 in (3.5) may be replaced by 6g" = 
> 6"/n,, 4 
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McCarthy (1969) proposed the BRR method for the important special case of n, = 2. A 
set of J ‘‘balanced’’ half-samples is formed by deleting one psu in the sample from each stratum. 
This set may be constructed from Hadamard matrices. The BRR variance estimator is given by 


SBrr(0) = (6% — 6)7/J, (3.6) 
y 


where 6 is the estimator computed from the j-th half sample. Again, several variations of 
(3.6) can be obtained. The BRR method has been extended recently to the general case of une- 
qual n,, using asymmetrical orthogonal arrays (Gupta and Nigam 1987; Wang and Wu 1988). 

The bootstrap method for the stratified design involved the following steps (Rao and Wu 
1988): (i) Draw a simple random sample {rj;}/2", of size m,, with replacement from {r,;}7“,, 
independently for each h. Calculate 


Fi = TF, + [m_/ (mm — 1)1% Cr — F,),t = mm! be Thi 


I 


and 6 = g( Yf;,). (ii) Independently replicate step (i) a large number, B, of times and calculate 
the corresponding estimators 6, ..., 0”. (iii) The bootstrap variance estimator of 6 is given by 


sgoor(8) = )) (6° — 6)7/(B — 1). (3.7) 
b 


Confidence intervals can also be obtained by approximating the distribution of t = (6 — @)/ 
s,(6) by its bootstrap counterpart f = (6 — 6)/s*(@), where s*(6) is obtained from s3(6) 
by jackknifing the particular bootstrap sample {rj;}. Two-sided 1 — a level ‘‘bootstrap-t’’ 
confidence intervals on 6 are then given by 


{0 — typs,(6),6 — tows, (9) }, (3.8) 


where f; ow and fyp are the lower and upper a/2 points of ¢ obtained from the bootstrap 
histogram of ¢!, ..., £2. One-sided confidence intervals can also be obtained from the 
bootstrap histogram. Also, one could use the linearization variance estimator instead of the 
jackknife variance estimator in constructing the confidence intervals. For confidence intervals 
we need a much larger number, B, of bootstrap samples than for variance estimation. Regarding 
the choice of bootstrap sample sizes m,, the choice m, = n; — 1 is attractive since it gives 
Fri = Thi- 


Comparison of the methods 


Theoretical properties of the methods reported in the literature include the following: 
(1) All the variance estimators reduce to the ‘‘standard’’ one, s?( Y) given by (3.1), in the 
linear case g(Y) = Y. (2) For smooth functions g(Y), all the variance estimators are asymp- 
totically design consistent (Krewski and Rao 1981). The jackknife variance estimator, how- 
ever, is known to be inconsistent for nonsmooth functions like the quantiles, even in the case 
of simple random sampling. Hence, caution should be exercised in using jackknife software. 
(3) If n, = 2 for all h, then the jackknife and linearization variance estimators are asymp- 
totically equal to high order terms for smooth functions g(Y), indicating that the choice between 
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these methods in this important special case should depend more on other considerations like 
computational costs (Rao and Wu 1985). Turning to empirical studies, Kish and Frankel 
(1974) studied the linearization, jackknife and BRR methods, using data from the Current 
Population Survey and sample designs with n, = 2 clusters from each of L = 6, 12 and 30 
strata. They evaluated the empirical coverage probability of the 1 — a@ level confidence 
intervals, 6 + i 25(8), for ratios, regression and correlation coefficients, where f,/2 is the 
upper a/2-point of a f-variable with L degrees of freedom and s*(6) is anyone of the variance 
estimators. The BRR method performed consistently better, in terms of coverage probability, 
than the jackknife which in turn was better than the linearization method; the observed dif- 
ferences were small for ratios. The methods performed in the reverse order with regard to 
stability of variance estimator. Other empirical studies in the literature reported similar results. 
Regarding the bootstrap, a simulation study by Kovar, Rao and Wu (1988) indicates that the 
bootstrap f-intervals track the nominal error rate in each tail better than the intervals based 
on the normal approximation tot = (6 — 6)/s(6), but the bootstrap variance estimators are 
less stable than those based on the linearization or the jackknife. The second order equivalence 
of the latter two variance estimators for the special case n, = 2 is also confirmed. 

Computationally simpler methods of variance estimation than the previous methods have 
also been proposed in the literature, e.g., random group method and partially balanced repeated 
replication, but these variance estimators do not reduce to the ‘‘standard’’ one in the linear 
case. Methods of constructing models from which sampling errors can be imputed have also 
been proposed. Such methods are useful in producing ‘‘smoothed’’ standard errors for 
estimators for which direct computations have not been made, and also in presenting stan- 
dard errors in a concise form (e.g., graphs) in published reports. 

Wolter’s (1985) book gives an excellent introduction to recent developments in variance 
estimation, and illustrates the methods on data from a variety of large-scale surveys. Recent 
review papers on variance estimation include Rust (1985) and Rao (1988). 


4. ANALYSIS OF SURVEY DATA 


Standard methods of data analysis are, in general, based on the assumption of simple random 
sampling. These methods have also been implemented in standard statistical packages, including 
SPSS*, BMDP and SAS. Application of standard methods to survey data without some 
adjustment for survey design, however, can lead to erroneous inferences, since most such data 
are obtained from complex sample surveys involving clustering, stratification and unequal pro- 
bability sampling, and as a result do not satisfy the assumption of simple random sampling. 
In particular, standard errors of parameter estimates and associated confidence intervals can 
be seriously understated if the effect of design is ignored in the analysis of data. Similarly, the 
actual type I error rates of tests of hypotheses can be much bigger than the nominal levels. 
Standard exploratory data analyses, such as residual analysis to detect model deviations, are 
also affected. Kish and Frankel (1974) and others drew attention to some of these problems 
with standard methods and emphasized the need for new methods that take proper account 
of the complexity of survey data. During the past 10 years or so, rapid progress has been made 
in developing such methods for the following types of analyses: (a) analysis of multi-way con- 
tingency tables; (b) analysis of domain means or domain proportions; (c) linear regression 
analysis; (d) multivariate analysis including principal component analysis and factor analysis. 
A brief account of some of these developments is given in this section, and the reader is referred 
to review articles by Nathan (1988), Rao (1987) and Smith (1984), and a book edited by C.J. 
Skinner, D. Holt and T.M.F. Smith (1989). 
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4.1 Analysis of Multi-way Contingency Tables 


Chi-squared tests (or likelihood ratio tests) are frequently used for the evaluation and selec- 
tion of parsimonious models on p, the population cell probabilities, in a multi-way contingency 
table with T cells. For this purpose, loglinear models are convenient because of their close 
similarity to analysis of variance in systematically providing test statistics of various hypotheses 
associated with a multi-way table. Rao and Scott (1984) made a systematic study of the impact 
of survey design on the standard chi-squared test of goodness-of-fit of a loglinear model, 
denoted by X”. They showed that X’ is asymptotically distributed as a weighted sum, Y 6;W, 
of T — r — 1 independent x? variables W;, where the weights 6; are the eigenvalues of a 
“*generalized design effects’’ matrix and T — r — 1 is the degrees of freedom. This general 
result shows that the survey design can have a substantial impact on the type /error rate of X°. 
For instance, under a constant design effects clustering model, 6; = ) for alli, the actual type 
T error rate, for nominal level a, is approximately given by Pr[x?_,_; > \~!x#_,_1(a@)] 
which increases with the clustering effect, 2. 

Rao and Scott (1984,7) obtained simple first-order corrections to X* which can be comput- 
ed from published tables that include estimates of design effects (or standard errors) for cell 
estimates p and their marginal totals, thus facilitating secondary analyses Cx also Fellegi 1980, 
Gross 1984, and Bedrick 1983). A first-order correction refers X7/65. to x7-_,_1, where 6. is 
an estimate of the average design effect 6. = ¥6,;/(T — r — 1) or an estimate of an upper 
bound on 6.. The corrected test is asymptotically valid in the case of constant design effects 
clustering, and in general it should perform well when the variability of the 6;’s is small. More 
accurate, second-order corrections that take account of the variability in the 6;’s can also be 
obtained by using the Satterthwaite approximation to the weighted sum of independent x7 
variables (Rao and Scott 1984). These tests, however, require the knowledge of a full estimated 
covariance matrix of p. Alternative methods that take account of the survey design include 
the Wald statistics based on weighted least squares (Koch, Freeman and Freeman 1975) and 
the jackknife chi-squared tests (Fay 1985). The latter tests are applicable to survey designs 
permitting the use of a replication method, such as the jackknife or the BRR. The Wald tests 
require the full estimated covariance matrix of fp, whereas the jackknife tests require access 
to cluster-level estimates. 

Fay (1985) and Thomas and Rao (1987) showed that the Wald test which refers to x7_,_1, 
although asymptotically correct, can become highly unstable as the number of cells in the 
multi-way table increases and the number of sample clusters decreases, leading to unaccep- 
tably high type J error rates compared to the nominal level, a. On the other hand, Fay’s jack- 
knife tests and the Rao-Scott corrections performed well under quite general conditions. A 
simple modification to the Wald test which refers to an F distribution on T — r — 1 and 
f — T + r+ 2degrees of freedom performed better than the Wald test in controlling the type 
Terror rate, where f is the degrees of freedom for estimating the covariance matrix of p. 


4.2 Analysis of Domain Means or Domain Proportions 


Analysis of domain (or subpopulation) proportions associated with a binary response 
variable is of considerable interest to researchers in social and health sciences, and other sub- 
ject matter areas. Logistic regression models are extensively used for this purpose in conjunc- 
tion with standard statistical methods for binomial proportions. Rao and Scott (1987) obtained 
simple first-order corrections to standard chi-squared tests of goodness-of-fit and of nested 
hypotheses which can be computed from published tables that include estimates of design effects 
(or standard errors) of domain proportions. Roberts, Rao and Kumar (1987) derived more 
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accurate second-order corrections to standard tests, but these require access to a full estimated 
covariance matrix of domain proportions. Diagnostics for detecting outlying domain propor- 
tions and influential points in the factor space were developed as well, again taking the sampling 
design into account. 

Koch, Freeman and Freeman (1975) used weighted least squares methods to analyze domain 
means of a quantitative variable, y, and developed Wald tests of goodness-of-fit of the model 
and of linear hyptotheses on the model parameters. The performance of Wald tests can be 
improved, as in Section 4.1, by using an F-modification. 


4.3 Linear Regression Analysis 


In Section 3.2, we considered design-based inferences on nonlinear, finite population 
parameters such as the finite population simple regression coefficient B. The pivotal t = 
(B — B)/s(B) is approximately N(0,1), where B is the design-consistent estimator, (3.2), of 
B, and its standard error, s(B), can be obtained either through the linearization method as 
in (3.4) or by using one of the replication methods. This approach readily extends to multiple 
regression coefficients. The design-weighted estimator B or its multiple regression analogue 
can be obtained by the weighted regression option of standard packages by using the survey 
weights attached to the sample elements as the weights in the regression. However, the stand- 
ard error of B resulting from this routine remains incorrect. 

Some people argue that most users are concerned with inferences on parameters of an 
appropriate superpopulation model rather than inferences on finite population parameters like 
B. However, the interest in B can also be justified by considering it as the least squares estimator 
of the superpopulation parameter @ in the model 


y, =a + Bx; + e with En(e;)) = 0, i= 1,...,N. (4.1) 


If the population size is large, then estimating B is effectively equivalent to estimating 8, while 
if the model (4.1) is misspecified to the extent of making 6 meaningless, then B may still be 
of interest as the slope of the least squares line fitted to the N-pairs ();,x;) (Godambe and 
Thompson 1986). 

Scott and Holt (1982) used a model-dependent approach to investigate the effect of two- 
stage sampling on standard regression analysis. They assumed a regression model of the form 
(4.1) with equi-correlated error terms e; within each cluster, as in Fuller (1975). This model 
also holds for the sample pairs (9;,x;), i¢s, if the selection probabilities are not related to the 
dependent variable, as in the case of two-stage random sampling. The results of Scott and Holt 
indicate that the effect of a positive intra-cluster correlation is to understate the standard errors 
of parameter estimates, and consequently inflate the type / error rates of customary tests. Wu, 
Holt and Holmes (1988) made a systematic study of the effect of two-stage sampling on the 
customary F-statistic, and proposed a correction for the F test for unknown intra-cluster cor- 
relation, as an alternative to iterative generalized least squares (GLS) procedure. Both the GLS 
procedure and the F-correction require known cluster labels which may not be available when 
the survey data are used for secondary analysis. 

If the regression model includes all the design variables z related to the dependent variable, 
such as stratum indicator variables and size measures of units, and the errors e; are indepen- 
dent with a constant variance o”, then standard regression analysis is valid under the model- 
dependent approach (Pfefferman and Smith 1985). However, such models may involve too 
many parameters to be useful. Also, the design variables may not be of intrinsic interest to 
the user, or may not be available in secondary analysis. In such situations, we are often interested 
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in models of the form (4.1), where x is not a design variable. The sample pairs (;,x;),i€s 
however, may not satisfy the model due to sample selection bias. Nathan and Holt (1980) pro- 
posed an adjusted regression approach to take account of selection bias, and compared it with 
ordinary least squares and the design based approach based on B and s(B). This approach 
assumes specific relationships between the regression variables and the design variables. Their 
empirical results indicate that ordinary least squares inferences can be highly unreliable, that 
the design-based approach is basically reliable except under extreme selection schemes, and 
that the adjusted regression approach performs well. Pfefferman and Holmes (1985) study the 
robustness of these procedures to misspecification of relationships between the regression 
variables, and conclude that the adjusted regression approach is very sensitive to model 
misspecification. The design-weighted estimator B is robust, but a more efficient estimator 
is obtained by modifying the adjusted regression estimator to be design-consistent for the finite 
population regression coefficient, B. 


4.4 Miultivariate Analysis 


The methods in Section 4.2 for the analysis of domain means can be extended to the 
multivariate case of domain mean vectors, but no detailed studies of such extensions have been 
reported in the literature. The literature on multivariate anlaysis of survey data is largely devoted 
to the analysis of covariance structures, in particular to principal component analysis and factor 
analysis. Bebbington and Smith (1977), Tortora (1980) and Skinner, Holmes and Smith (1986) 
investigated the effect of sample design on standard principal component analysis. Their results 
indicate that the application of standard methods, without some adjustment for the sample 
design, can lead to erroneous inferences. In particular, the estimators of eigenvalues and 
eigenvectors of the covariance matrix, ),, can be severely biased for non-self-weighting 
sample designs. Skinner, Holmes and Smith (1986) proposed maximum likelihood (ML) 
estimators, under a multivariate normal model, and probability-weighted (or design-based) 
estimators, to adjust for the effects of the sample design. Their simulation study indicates that 
both estimators perform well unconditionally, while the probability-weighted estimators exhibit 
a conditional model bias. The ML estimators, however, may be sensitive to model misspecifica- 
tion. A probability-weighted version of the ML estimators may be more robust, as demonstrated 
by Pfefferman and Holmes (1985) in the context of the adjusted regression approach (section 
4.3). Fuller (1987) derived design-based estimators of the parameters in factor analysis, and 
the estimated covariance matrix of the estimators. He showed that the estimated variances based 
on normal theory can seriously underestimate the true variances of the factor estimators. 


5. COMPUTER SOFTWARE 


Several computer package programs for variance estimation in complex surveys were 
developed in the mid to late 1970’s, often in conjunction with programs for regression analysis 
of survey data. Wolter (1985, pp. 393-412) reviewed the latest versions of these programs to 
about 1985. Among the programs listed by Wolter, the ones most commonly used are 
CLUSTERS (Verma and Pearce 1977), the programs &PSALMS and &REPERR in the OSIRIS 
IV system (Vinter 1980 and Lepkowski 1982), SUDAAN (Shah 1981a, 1981b, 1982 and Holt 
1979), HESBRR (Jones 1983) and SUPER CARP (Hidiroglou, Fuller and Hickman 1980). 
The programs HESBRR and the OSIRIS IV program &REPERR use balanced repeated replica- 
tion as the variance estimation technique; the remaining three use the Taylor linearization 
method. 
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Cohen, Burt and Jones (1986) evaluated the variance estimation programs for means and 
ratios, with the exception of CLUSTERS, using a large data set from the National Medical 
Care Expenditure Survey. They found that the programs SESUDAAN and RATIOEST in the 
SUDAAN collection were the most efficient in terms of CPU time usage and easier to program 
than the others. 

One major current trend in software development is the development of menu-driven 
packages on micro-computers. Variance estimation and specialized survey analysis software 
is no exception to this trend. A notable enhancement to the commonly used variance estima- 
tion programs since 1985 is the introduction of PC CARP (Schnell et a/. 1986 and Schnell 
et al. 1988), available on IBM AT/XT or compatible micro-computers with a math co- 
processor. This package, like its predecessor SUPER CARP, uses Taylor linearization methods 
for variance estimation. A second variance estimation package is also available on micro- 
computers. The package listed as BELLHOUSE in Wolter (1985, p. 399) has been adapted 
for IBM micros with or without a co-processor by Rylett and Bellhouse (1988) under the pro- 
gram name TREES. This software uses tree structures to mimic the structure of stratified 
multistage sampling designs and applies tree traversal algorithms, in conjunction with general 
results on variance estimation in multi-stage sampling (see section 3.1), to the calculation of 
variance estimates. 

A second trend in the computer implementation of survey variance estimation and survey 
analysis techniques is the integration of survey software with widely used statistical analysis 
systems. A leader in this trend from the early 1980’s is the SUDAAN system, which is com- 
prised of a series of several SAS procedures. Freeman et a/. (1985) and Hidiroglou and Paton 
(1987) both used the PROC MATRIX procedure in SAS to obtain survey variance estimates, 
the former by balanced repeated replication and the latter by Taylor linearization. Mohadjer 
et al. (1986) report the development of a new SAS procedure WESVAR to obtain survey 
variance estimates by balanced repeated replication. 

A variety of packages and computing techniques are available to carry out the analyses of 
survey data reviewed in Section 4. Among the available specialized packages, the most com- 
prehensive appears to be the PC CARP. The original program, SUPER CARP, was designed 
to carry out regression analyses developed by Fuller (1975); the PC version retains this option. 
The current version now contains additional options for categorical data analysis, and inferences 
on cumulative distribution function and associated quantiles, following methods given by Fran- 
cisco and Fuller (1986). For categorical data, there is an option for the analysis of two-way 
contingency tables, based on the Rao-Scott corrections to chi-squared test of independence. 
The program can also be manipulated to perform factor analyses of survey data. 

There are four other specialized packages for the analysis of survey data; between them 
they cover topics in regression and categorical data analysis. The &REPERR program in 
OSIRIS IV and the SURREGR procedure in SUDAAN both calculate standard errors of 
regression coefficients so that regression analyses can be carried out. The programs CPLX, 
developed by Fay (1982), and RSPLX, also by Fay, handle categorical data analyses of log- 
linear models for two and multi-way tables. The analysis in CPLX is carried out using jack- 
knifed chi-square statistics, while RSPLX applies second order Rao-Scott corrections to the 
usual test statistic. 

The four programs for the regression analyses for complex survey data were evaluated 
by Cohen, Xanthopoulis and Jones (1988). The older version, SUPER CARP, was included 
in this analysis rather than PC CARP. Similar to the earlier study of Cohen, Burt and Jones 
(1986) on variance estimation, data from the National Medical Care Expenditure Survey were 
used. Once again, a program in the SUDAAN suite of programs, SURREGR, was the most 
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efficient in terms of CPU time usage and easier to program than the others. However, the 
efficiency of the SUDAAN programs might be balanced by the flexibility of the PC CARP 
program, depending upon the survey analysis required. 

Significant enhancements to SUDAAN are provided in the new SUDAAN system under 
development (LaVange et a/. 1989). Variance estimation and data analysis methods not available 
in SUDAAN are among the many modifications incorporated into the new SUDAAN System. 

Running almost parallel to the emerging trend in the calculation of variance estimates, 
there is pagea move towards incorporating methods for the analysis of complex survey data 
into standard statistical packages and systems. Following on their variance estimation methods 
using SAS procedures, Hidiroglou and Paton (1987) describe further SAS procedures to 
carry out log-linear analyses, with Rao-Scott corrections, of multi-way contingency tables. 
Likewise, Freeman (1988) notes that he used the SAS procedure PROC MATRIX for both 
variance estimation and for the analysis of variance of his survey data. Similarly, Mahodjer 
et al. (1986) describe two other new SAS procedures in addition to the variance estimation 
procedure WESVAR. These are the previously mentioned NASSREG and NASSLOG which 
carry out weighted least squares regression analyses and logistic regression analyses respec- 
tively. Both procedures depend on balanced repeated replication for variance estimation 
of the model parameters. An alternative approach to using SAS procedures is to use the 
matrix algebra language GAUSS (Platt 1986). Based on their own experience, Rao and 
Thomas (1988) favorably report on the use of this language for categorical data analysis in 
complex surveys. 


6. CONCLUDING REMARKS 


The early milestones in the development of efficient sampling designs and associated estima- 
tion techniques for population totals and means have firmly established sample survey theory 
and methods as a major discipline in statistics. Subsequent developments in theoretical foun- 
dations of sampling theory have provided useful insights into inferential aspects. In particular, 
the model-assisted approach and the conditional design-based approach appear to be promising 
since they attempt to fill the ‘‘gap’’ between the traditional approach and the model-dependent 
approach by retaining the desirable features of both approaches, but more research is needed 
in this area to handle complex sampling designs. Recent advances in variance estimation and 
confidence intervals for nonlinear statistics and the associated computer software, are also 
equally impressive. It is also gratifying that rapid progress has been made in the development 
of methods for the analysis of survey data that take account of the complexity of the sampling 
design, and the associated computer software. 

We can expect to see important new developments in the next 10 years or so in the areas 
of variance estimation for nonlinear statistics (especially, nonsmooth functions), analysis of 
survey data (especially, multivariate analysis), and other topics not covered here (especially, 
sampling in time and small area estimation). 
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COMMENT 
T.M. FRED SMITH! 


Sample surveys are one of the most important areas of the application of statistics. The paper 
by Professors Rao and Bellhouse is an excellent review of the theoretical development of sample 
surveys and I find it hard to be critical; but in the best traditions of the Royal Statistical Society 
I shall make the attempt in as constructive and a controversial manner as possible. In any review 
paper the choice of topics, especially relating to recent work, must be to some extent subjec- 
tive. This affords a discussant an easy target; criticize the authors for their sins of omission. 
Also a review must be wide ranging and this allows discussants freedom to ride their own hobby 
horses over the range. I shall adopt both approaches and my objective in so doing is to iden- 
tify some additional issues which I believe are important thus widening the review still further. 

There is now general agreement about the milestones of our subject. These are associated 
with the names of Kiaer, Bowley, Neyman, Cochran, Hansen, Hurwitz, Madow, Mahalanobis, 
Horvitz and Thompson - an international collection dominated, latterly by contributions from 
the USA. Kiaer and Bowley’s work was fundamental because they demonstrated that valid 
conclusions could be drawn from representative samples of quite small size drawn from large 
populations with arbitrary values. Representative samples were stratified samples with pro- 
portional allocation, and Bowley derived the appropriate theoretical results. Neyman and subse- 
quent authors argued the case for random sampling and developed a comprehensive theory 
of randomisation inference applicable to most sampling schemes. Durbin (1953) completes the 
theory with his multi-stage sampling results. Despite the importance of these results sample 
surveys became a Cinderella subject on the fringes of mainstream statistics, and even today 
most university departments do not have a sampling statistician on their staff. Why is this? 

One reason is that sample survey theory has developed mainly within social science and 
official government statistics, whereas most statisticians have a training within mathematics 
and physical science. Although all experimental scientists deal with samples very few seem to 
recognise this explicitly and those that do, such as geologists and biologists, have developed 
their own theory of sampling and estimation. In my view it is time to bring together sampling 
experts from all areas of scientific enquiry to share ideas and experiences and hopefully to estab- 
lish a global theory of sample surveys. 

A second reason is that sample surveys starts with a population which is a real fixed finite 
population of units. Samples are then drawn from this population according to specific rules. 
In most scientific enquiries the position is reversed; the population is not well defined and the 
scientist starts with a sample. One view of the role of the statistician, as enunciated, for example, 
R.A. Fisher, is to define the hypothetical population from which the sample data can be viewed 
as arandom sample. This approach begs the question whether this hypothetical population 
has any scientific value. Arguably the sample survey approach of starting with the population 
has much to commend it. 

A third reason is that since the finite population units can take arbitrary values the popula- 
tion cannot be summarized by a few parameters. Notions like sufficiency have little value in 
sample survey theory, and sample data are usually summarized by a mass of cross-tabulations. 
The estimation of a large number of cell proportions is the primary aim of sample surveys and 
the object of inference is usually descriptive rather than explanatory. 

A final reason for the separation of sample surveys from mainstream statistics is that the 
randomisation theory of sample surveys is so complete. It is a closed theory which if accepted 
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has few remaining problems to be solved. The chief concerns of randomisation researchers 
since Horvitz and Thompson (1952) provided the general theoretical framework have been the 
construction of mps sampling schemes with non-zero joint inclusion probabilities, the produc- 
tion of methods and programs for variance estimation and the construction of estimators which 
employ auxiliary information but can never be generally efficient because of Godambe’s result. 
All of these problems are important, but they are not exciting, they lack the philosophical and 
mathematical depth to capture the imaginations of young mathematical statisticians. 


These reasons are my explanation why sample surveys have been seen in the past as an 
activity on the fringe of mainstream statistics. The position is changing now and I detect 
a coming together of the branches of statistics. Much recent work in sample surveys has 
attempted to integrate surveys into mainstream statistics and many areas of statistics now 
recognise the importance of selection effects. Has the sample survey Cinderella been invited 
to the Statisticians’s Ball? 


In addition to his non-existence theorem Godambe has also shown that within the randomisa- 
tion framework the likelihood is proportional to the probability of selection, p(s | z), where 
Z is the prior information on which the design was based, which for fixed s is a constant. Thus 
the likelihood is completely uninformative. In the same set-up Basu (1971) showed that the 
sufficient statistic is { (i,y;):i€s}, namely the complete data tape including the labels. Although 
these results are also negative, highlighting the distinction between randomisation inference 
and other forms of inference, they did stimulate interest amongst a wider group of statistician 
and so had a positive value. My own interest in the theory of sample surveys was stimulated 
by Ericson (1969), in particular by the way he incorporated the uninformative likelihood into 
a positive framework via Bayes theorem and exchangeable priors. Ericson’s use of 
exchangeability deserves consideration by all statisticians, not just Bayesians. Is it reasonable, 
is it even possible, to have a valid theory of predictive inference without some form of 
exchangeability? If there is no function of the unit values which is exchangeable how can you 
predict the unobserved values from the sample values? My opinion is that Ericson’s work was 
a milestone in the development of sample survey theory. 


The uninformative nature of the randomisation likelihood led some statisticians to ques- 
tion the role of randomisation. Godambe himself refers to ‘‘the problem of randomisation’’ 
and developed alternative theoretical approaches which required randomisation. Ericson also 
found a role for randomisation within his exchangeable set-up. He argued that if you employ 
your prior information, z, to form groups of units which are approximately exchangeable a 
priori then the use of simple random sampling will guarantee exchangeability. Royall (1970, 
1973), however, made the mistake of advocating purposive sampling within his model-based 
framework. He touched a raw nerve and brought down upon his head the wrath of the ran- 
domisation establishment. I thought that Royall had asked some serious questions which 
deserved an answer and the strength of the reaction surprised me. Why did academic survey 
samplers and those from government agencies in North America feel so strongly about ran- 
domisation? Their colleagues in market research seemed happy with quota samples which could 
be viewed as a special case of balanced sampling. In Europe many official surveys are based 
on quota samples. What is so special about official statistics in North America? 

I think the answer lies deep in the American political psyche. Thoughtful Americans are 
democratic in the true sense of that term. They believe in individual freedom and the right to 
information, they are also deeply suspicious of governments. They recognise the need within 
a democracy for reliable statistical information. To the official statisticians randomisation is 
the guarantee of the objective reliability of their data. It is a key source of their professional 
integrity and any attack on randomisation was seen as potentially dangerous however well 
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intentioned. I admire this position and it has helped to convince me that randomisation is one 
of the great contributions of statistics to science. 

I have expressed myself with some feeling because I am so unhappy about the present posi- 
tion of official statistics in the U.K.. The tradition in the U.K. is not naturally democratic, 
we are still a monarchy, we respect authority rather than the individual. This tendency is being 
exploited and there is now a serious erosion of public confidence in the Government’s use of 
statistics. It has been argued that official statistics in the U.K. are collected to aid the deci- 
sions of government, not to help parliament or to inform the electorate. Key series have been 
stopped, definitions have been changed, information is presented by ministers in ways which 
are patently false, yet no government statistician can complain publically because of the Official 
Secrets Act. There is a dangerous public cynicism about statistics and George Orwell’s predic- 
tions in his novel 1984 may be closer to the truth than we realise. I apologise to the authors 
for this digression, but I said I would ride some hobby horses, and the issue of the integrity 
of official statistics is of great importance. 

Before leaving randomisation theory I would like to make some comments about repeated 
surveys and rotation sampling. Again this is an area which the authors have excluded although 
they did note Patterson (1950) as a milestone paper. Randomisation theory has been devel- 
oped within the framework of the one-off cross section survey. The extension to repeated 
surveys is non-trivial for it is difficult to retain the probability structure over time under rota- 
tion sampling when the population changes, Fellegi (1963). For the measurement of gross flows, 
or transition probabilities, the role of the randomisation inclusion probabilities is not clear. 
The beautiful simplicity of randomisation theory for one-off surveys is destroyed when they 
are repeated over time. But most important surveys are repeated surveys, especially in the 
government sector, so what are the implications? 

As always the answer is that it depends. If the primary purpose is to produce descriptive 
statistics of the state of the system at each time period then the surveys can be considered as 
repetitions of a cross-section survey and each one can be analysed independently. Although 
composite estimators or time series estimators may be more efficient they should be viewed 
as secondary estimators rather than primary estimators. If I wanted to use repeated survey data 
within an econometric model I would prefer to input the cross-section estimates with their 
known correlation structure rather than complex composite estimates. On the other hand if 
I wanted the best estimate of the current value of, say, unemployment, for a particular pur- 
pose, not for public consumption, then I would use the most efficient procedure available. 
Similarly if I wanted to explain the change in value of some estimates over time then I would 
need to go beyond simple randomisation analysis. Thus the problems with randomisation 
inference for repeated surveys occur mainly for secondary analyses. However, there remains 
the important issue of which estimates should be reported to the public. 


Section 2 of the paper is devoted to work on the theoretical foundations of inference from 
survey data carried out during the last 30-40 years. The authors have chosen to distinguish three 
approaches, design-based, model-dependent and model-assisted, the latter being an attempt 
to find a compromise solution between the other two. Personally I prefer to go for a GUT 
(Grand Universal Theory) approach integrating both design and models into one framework. 
The important influences on my thinking in this area, in addition to Ericson, have been Scott 
(1977) and Rubin (1976). In the GUT approach the survey variables, the sampling mechanism, 
and any other selection and measurement mechanisms are all introduced explicitly into an 
overall model. If Yis the m x p matrix of measured survey variables, z is the prior information, 
s denotes the sample, s* C s denotes the respondents, then the joint distribution of all these 
variables is 
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where the survey design, represented by p(s | z), is of the so-called uninformative type such 
as random sampling. The design is uninformative because z is assumed known and includes 
all the usual information on stratification, clustering and measures of size. This general for- 
mulation forces statisticians to face up to all their assumptions. Non-response must be modelled 
explicitly. Measurement errors must be included in the structure of f (Y | z;0)g(z;$). The 
decision to use randomisation inference is then an explicit statement that given z the values 
of Y can be treated as unknown constants; they are arbitrary values about which we have no 
additional information. A modeller, on the other hand must specify the model to the level 
needed for inference, for example, by an exchangeable model. Both design-based and model- 
dependent approaches condition on the same prior information, z, and so both should employ 
similar, possibly identical, structures. In fact I would rarely expect the point estimators using 
the different approaches to differ very much in practice. The issue thus becomes that iden- 
tified by the authors as the choice of a measure of uncertainty. Model-dependent procedures 
employ conditional variances, strict design-based procedures are unconditional. How to 
construct conditional design-based inferences is still an open question, but the approach of 
Robinson (1987) looks promising. The GUT model shows the design-based versus model-based 
controversy to be what it is, namely a relatively small philosophical dispute within the much 
bigger framework of total survey analysis. 

The failure of both theoretical and practical statisticians to integrate sampling and non- 
sampling errors into measures of total survey error even after 50 years of intensive research 
must be noted as one of the failures of this important branch of statistics. But again things 
are changing and the mood now is no longer merely to report sampling errors and in addition 
to give vague warnings about the potential size of non-sampling errors but it is to attempt to 
measure total survey error recognising that some non-sampling biases can far exceed sampling 
errors. 

Section 4 of the paper is devoted to the analysis of survey data, to the analytic rather than 
descriptive uses of surveys. Here the design-based, model-based dispute pales into 
insignificance. Analysts must face up to all the classical problems of model choice, estimation 
and testing, residual analysis and so on, which make up mainstream statistics. Cinderella is 
at last dancing with the Prince. 

My final comments are again personal. If you look at the references at the end of the paper, 
and if you consider the additional areas which I have discussed, then you will see that Jon Rao 
has contributed important papers in every area. I think that it was particularly appropriate 
that he was invited to write this paper. I congratulate both authors on their fine paper. 
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ABSTRACT 


The basic theme of this paper is that the development of survey methods in the technical sense can only 
be well understood in the context of the development of the institutions through which survey-taking 
is done. Thus we consider here survey methods in the large, in order to better prepare the reader for 
consideration of more formal methodological developments in sampling theory in the mathematical 
statistics sense. After a brief introduction, we give a historical overview of the evolution of institutional 
and contextual factors in Europe and the United States, up through the early part of the twentieth 
century, concentrating on governmental activities. We then focus on the emergence of institutional bases 
for survey research in the United States, primarily in the 1930s and 1940s. In a separate section, we take 
special note of the role of the U.S. Bureau of the Census in the study of non-sampling errors that was 
initiated in the 1940s and 1950s. Then, we look at three areas of basic change in survey methodology 
since 1960. 


KEY WORDS: Censuses; Cognitive aspects of survey design; Non-sampling errors; Probability sampling; 
Survey organizations. 


1. INTRODUCTION 


The development of survey methods in the technical sense can only be well understood in 
the context of the development of the institutions through which survey-taking is done. The 
purpose of this paper is to consider survey methods from this broader perspective in order to 
better prepare the reader for consideration of more formal methodological developments in 
sampling theory in the mathematical statistics sense that are described in numerous texts on 
sampling as well as in Rao and Bellhouse (1990). Although our viewpoint and organization 
is somewhat new, we have relied heavily on secondary sources which provide detailed exposi- 
tions alternative to ours. Our paper focuses on the American experiences in the development 
of survey methodology, but it sketches some background of the much broader social science 
and institutional settings out of which survey methodology grew. 

In the next section we present a very brief historical overview of the evolution of this institu- 
tional and contextual background, up through the early part of the twentieth century. We see 
two broad strands - social research and censuses. We begin with a short synopsis of the early 
history of European social research, turn to a brief overview of census-taking, especially in 
the context of the United States, and then take up the role of the International Statistical 
Congresses in the late nineteenth and early twentieth century in establishing the importance 
of sampling. Even following these congresses, the possible role of probability in sampling was 
not broadly understood. Further steps required an institutional base. 

In section 3, we focus on the emergence of other U.S. institutional bases for survey research 
in the 1930s and 1940s. In particular, we note that a missing institutional ingredient was pro- 
vided by the creation of the U.S. statistical agencies at the beginning of the twentieth century. 
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Then a number of factors, including the depression of the 1930s, the development of probability 
sampling methodology, and a U.S. federal statistical coordinating function came together to 
launch the modern era of survey methodology in the U.S. We also review market research and 
the universities as institutional bases. In section 4, we take special note of the role of the U.S. 
Bureau of the Census in the study of non-sampling errors that was initiated in the 1940s and 
1950s. In section 5, we look at some of the basic changes in survey methodology since 1960, 
focusing on technological advances, the role of longitudinal surveys, and the recent movement 
to explore cognitive aspects of surveys. 


2. A HISTORICAL OVERVIEW OF THE INSTITUTIONAL BACKGROUND 
FOR MODERN SURVEY METHODS 


2.1 Institutional Bases in Early European Social Research 


One set of roots of the U.S. tradition of survey methodology and data collection technology 
is in early European social research (cf. Lecuyer and Oberschall 1978, from whose work we 
have drawn). 

In England that tradition can be traced to the seventeenth century. The research, dubbed 
political arithmetic, was based on administrative records (especially parish records) and per- 
sonal observation. It was usually carried out by dedicated individuals, such as John Graunt 
who published his Natural and Political Observations Made Upon the Bills of Mortality in 1662. 
Until the beginning of the eighteenth century, the parish was the unit of local government and 
administration, so that it was sensible to use the clergy as informants for many inquiries. With 
the industrial revolution and the rise of cities, this convenient arrangement broke down, 
necessitating the institution of house-to-house surveys. 

By the 1830s statistical societies were formed in England to investigate social problems. They 
organized committees, which in turn hired agents to go door-to-door to collect data. Although 
the statistical societies disbanded when the social problems seemed solved, similar procedures 
were revived towards the end of the nineteenth century when Booth (1889-1891) sent school 
attendance officers door-to-door to study London’s poor. 

In France, where the government was more highly centralized, early social research was car- 
ried out by the government. District administrators were used as informants to fill out ques- 
tionnaires on the demographic and economic conditions of their districts. By the mid-eighteenth 
century what we might consider an early study of the effects of mass communication was 
carried out in France. Administrators were instructed to spread rumors of increases in taxes 
and of military conscription and to report on the reactions of the populace. 

In the Napoleonic period following the revolution, the French government established a 
national office responsible for gathering survey-like data on population, social situation, 
agriculture, and industry and commerce (Bourguet 1988). While this effort was not fully suc- 
cessful, and while it fell short of census methods as we now understand them, it did set in place 
an institutional structure. During the nineteenth century, France continued the tradition of 
government responsibility for statistical functions through reporting of data by prefects and 
in its Bureau de Statistique. The Napoleonic effort also launched a social science data enter- 
prise in France that explicitly rejected the ideas from the theory of probability as it was then 
known. The French interest in social statistics affected many scientists, such as Laplace and 
Quetelet (a Belgian who studied in France under Laplace), who in turn contributed in major 
ways to the art and science of census-taking, attempting to reintroduce ideas from probability, 
through the use of what we now know as ratio estimation (see Stigler 1986, Chapter 5). After 
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the revolution of 1830, the Académie des Sciences Morales et Politiques sponsored prize com- 
petitions that encouraged statisticians to undertake their own research. 

In Germany, the origin of ‘‘statistics’’ (collection of data on the state) was in the univer- 
sities as early as the end of the seventeenth century. By the early nineteenth century this work 
was split into three parts, with descriptive political science and historical/ quantitative political 
economy remaining university-based but statisticians collecting data in census bureaus and other 
government agencies. 

In 1872, the Verein for Socialpolitik was founded - part pressure group, part professional 
organization, part research organization. It drew up questionnaires to be answered by sup- 
posedly knowledgeable informants such as landowners, ministers, and notaries. Problems of 
informants’ possibly inaccurate information, haphazardly grouped and imprecise questions, 
and low response rates dogged these efforts. By the early twentieth century, Levenstein (1912) 
published what was probably the first large scale attitude and opinion survey, for which he 
used a snowball sampling technique. At about the same time Max Weber attempted a survey 
of industrial workers, planning to get some information directly from respondents but finding 
that the majority did not care to cooperate. 


2.2 Censuses: A Prelude to Survey-Taking 


Another set of roots of survey methodology is intertwined with the history of methods for 
census-taking and thus we present a brief overview on censuses and census-taking infrastruc- 
tures. Many others have observed that the origins of the modern census are found in biblical 
censuses described in the Old Testament (Madansky 1986) as well as in censuses carried out 
by the the ancient Egyptians, Greeks, Japanese, Persians, and Romans (Taeuber 1978). The 
emphasis in the biblical accounts of censuses seemed to be on the results of the enumeration, 
rather than on how the counting was done, although in several instances we are told about 
the rapidity of the process. For most practical purposes we can skip from biblical times to the 
end of the eighteenth century and the initiation of census activities the United States of America, 
although there is some debate as whether Canada, Sweden, or the United States should be 
credited with originating the modern census (Willcox 1930). 

In the United States, the first census was taken in 1790 (in 1990 the U.S. government will 
take its bicentennial census) by State officials who were then reimbursed by the Federal 
government. Then, in the next census of 1800, the enumerators were deputies or assistants to 
Federal marshals (Duncan and Shelton 1978). It was only with the 1880 census that the central 
Census Office gained control over field operations and secured the authority to appoint 
enumerators. 

Prior to 1850 the U.S. decennial census considered the family as the unit of interest and 
reported few data on persons. The change to an individual-focus in census-taking was strongly 
influenced by the work Lemuel Shattuck, one of the of ASA’s founders who had earlier con- 
ducted the Boston census of 1845 (Anderson 1988, pp. 36-37), as well as that of Quetelet, who 
helped to organize the 1846 Belgian census (Willcox 1930). 

Progress on the methodology of census-taking continued, as every 10 years, a special oper- 
ation was mounted to fulfill the constitutional obligation of an enumeration of the U.S. popula- 
tion; however, there was a clear lack of continuity from one census to the next (American 
Economic Association 1899). It was only after the first 12 censuses had been taken that the 
Bureau of the Census was created in 1902 as a permanent agency. Over this period there was 
a steady expansion of the number of censuses of other sorts and the broadening of topics covered 
in addition to simple enumeration. 
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2.3 International Statistical Congresses 


The move from censuses to sample surveys was slow and laborious. Kruskal and Mosteller 
(1980) trace some of this movement, especially as it was reflected in the discussions regarding 
surveys that took place at the meetings of the International Statistical Institute (ISI), and our 
exposition here owes much to their work. The groundwork for these meetings was laid in the 
1850s by Quetelet who helped to organize the first of a series of International Statistical Con- 
gresses in 1853. After nine such Congresses from 1853 to 1876, the ISI was founded in 1885. 
It is interesting to note that there is only one index entry for sample surveys in Stigler’s (1986) 
history of statistics before 1900 - to 1830s work of Quetelet linked to a census method sug- 
gested by Laplace - and only two index entries in Porter’s (1986) history - one to a 1900 paper 
by Karl Pearson and the other to the work of Kiaer and the ISI. 

As early as the 1895 ISI meeting, Kiaer (1895-1896) argued for a ‘‘representative method’”’ 
or ‘“‘partial investigation’’, in which the investigator would first choose districts, cities, efc., 
and then units (individuals) within those primary choices. The choosing at each level was to 
be done purposively, with an eye to the inclusion of all types of units. That coverage tenet, 
together with the large sample sizes recommended at all levels of sampling, was what was judged 
to make the selection representative. 

The idea of less than a complete enumeration was widely opposed, but Kiaer presented 
arguments for it (with some members agreeing and others disagreeing) at ISI meetings in 1897, 
1901, and 1903. Towards the end of this period, the idea of probability sampling entered the 
discussion, but the topic of the representative method seems absent from the records of the 
ISI meetings until 1925. By then the record suggests that the representative method was taken 
for granted, and the discussions centered around how to accomplish representativeness and 
how to measure the precision of sample-based estimates (Bowley 1926; Jensen 1926). Notions 
of clustering and stratification were put forward, but purposive sampling was still the method 
of choice. 

It was not until Gini and Galvani made a purposive choice of which returns of an Italian 
census to preserve and found that districts chosen to represent the country’s average on seven 
variables were, in that sense, unrepresentative on other variables, that purposive sampling was 
definitively discredited (Gini 1928; Gini and Galvani 1929). Soon thereafter Neyman published 
his groundbreaking 1934 paper that demonstrated, among other thing, the virtues of probability 
sampling. 


3. THE DEVELOPMENT OF INSTITUTIONAL BASES FOR SURVEY 
RESEARCH IN THE UNITED STATES 


Survey research in the United States grew from a blending of the same three institutional 
bases that had been influential in Europe - private individuals acting as entrepreneurs in the 
private sector, universities, and the government. Early social research in this country (before 
World War I) seems to have followed the earlier British model, being carried out by social 
workers, public health workers, and reformers. An early university involvement was the hiring 
by the University of Pennsylvania in 1899 of W.E.B. DuBois to carry out his study of the 
Philadelphia Negro, conducted as a house-to-house survey. Starting in the 1930’s, and especially 
in the period after World War II, the U.S. experienced a flowering of survey methodology in 
the three broad institutional bases: market research and polling, universities, and government. 
But before we describe that flowering we shall take a step backwards and note the establish- 
ment of the U.S. government statistical agencies. 
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Recently, Jean Converse (1987) has written an extremely scholarly and graceful study of 
the roots and emergence of survey research in the United States, with special focus on market 
research and polling and on universities. Our exposition on these bases closely follow hers. 
We have separated out the institutional bases both to reflect a social reality and to structure 
our exposition. But there is another social reality that we ask the reader to bear in mind; the 
membranes separating the institutions are permeable. They not only permit the flow of cross- 
fertilizing ideas and methods in all directions; they also permit a somewhat lesser flow of people, 
as individuals move from one sector to another over the course of their careers. 


3.1 The Establishment of U.S. Statistical Agencies 


The establishment of American statistical agencies effectively began in 1863, when the newly 
created Department of Agriculture released the first crop and livestock report to provide infor- 
mation on Union food supplies during the Civil War. This report was based on data from a 
purposive sample of 2,000 farmers in 22 states. This agricultural statistical reporting activity 
has existed in the Department of Agriculture on a continuing basis to the present day and is 
now known as the National Agricultural Statistical Service. By the late 1920s, correlational 
and regression methodology was well established in the work of agricultural statisticians 
(Duncan and Shelton 1978). 

In 1884, Congress voted to establish a Bureau of Labor (later renamed the Bureau of Labor 
Statistics, BLS) to ‘‘collect information’”’ on the earnings and the working conditions of 
“laboring men and women.”’ Under the leadership of Carroll Wright, the first Commissioner, 
BLS expanded its statistical activities to cover such issues as depressions, strikes and lockouts, 
women’s wages, marriage and divorce, and the domestic liquor trade (Norwood and Early 
1984). 

With the creation of the Bureau of the Census in 1902, there were three major U.S. agencies 
in place, each with a mandate to collect national data on a regular basis. During the first three 
decades of the twentieth century, the role of government statistical agencies expanded con- 
siderably and, at the time of the stock market crash of October 29, 1929, data on various facets 
of economic and social life were available. As late as 1932, however, there were few examples 
of probability sampling anywhere in the Federal Government (Duncan and Shelton 1978). 

Difficult though it is to conceive in a period when we are used to receiving reliable readings 
on the unemployment rate monthly, there was no comparable survey data resource available 
in the 1920s and early 1930s. Except for selected monthly non-survey data gathered by BLS 
from most manufacturing industries and some nonmanufacturing industries, there were no 
regular national unemployment figures. In the 1920 census the question on unemployment was 
dropped because of statistical concerns regarding the accuracy of the resulting data. This 
question was restored to the 1930 census because of the wide-spread concerns regarding the 
employment situation. The extensive controversy that surrounded the 1930 unemployment data 
(Van Kleeck 1930) and those from the special January 1931 Unemployment Census was espe- 
cially acrimonious (Anderson 1988), and played a role in the 1932 presidential election campaign. 


3.2 The ASA-SSRC Committee and the Institutionalization of Probability Sampling: 
An Early Bridge 


Thus, at the beginning of the Great Depression of the 1930s in the United States, the federal 
statistical agencies had difficulty responding to the demand for statistics to monitor the effects 
of the programs of President Franklin Roosevelt’s New Deal. In 1933, Secretary of Labor 
Frances Perkins asked Stuart A. Rice, the ASA president, to set up an advisory committee on 
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the programs of BLS. This committee grew into the Committee on Government Statistics and 
Information Services (COGSIS), sponsored jointly by ASA and the Social Science Research 
Council (SSRC). Duncan and Shelton (1978) give a detailed account of the activities of COGSIS, 
and for our discussion here two outcomes are worthy of note. 

First, in 1933, COGSIS recommended the creation of a Central Statistics Board (CSB) to help 
coordinate government statistical activities. With the groundwork laid for a coordinated federal 
statistical system, COGSIS and CSB proceeded, in early 1934, to arrange for an interagency 
agreement through which Census would collect basic data on production and labor for BLS. 

Second, COGSIS helped to stimulate the use of probability sampling methods in various 
parts of the Federal government, and it encouraged research on sampling theory, to be done 
by employees of statistical agencies. For example, to establish a technical basis for unemploy- 
ment estimates, COGSIS and CSB organized an experimental Trial Census of Unemployment 
as a Civil Works Administration project in three cities using probability sampling, carried out 
in late 1933 and early 1934. The positive results from this study and the interagency arrange- 
ment mentioned above led in 1940 to the first large-scale, ongoing sample survey on employ- 
ment and unemployment using probability sampling methods. This survey later became the 
Current Population Survey. 

Another somewhat indirect outcome of the COGSIS emphasis on probability sampling took 
place at the Department of Agriculture Graduate School where W. Edwards Deming organized 
a series of lectures in 1937 on sampling and other statistical methods by Jerzy Neyman (1938). 
These lectures had a profound impact on the further development of sampling theory across 
the government as well as in universities. 

What we see happening in this period is the confluence of a number of factors that served 
to launch the use and development of sampling methods in the U.S. government statistical 
agencies. A key prerequisite was the existence of the agencies themselves. A second was the 
methodological advances in sampling theory as encapsulated in Neyman’s landmark 1934 paper. 
What was required to bring these together was the Great Depression, a new administration 
hungry for quality data to assess the impact of its social programs, and the joint ASA-SSRC 
Committee on Government Statistics and Information Services. 


3.3 Market Research and Polling 


The institutional base of survey methodology in U.S. market research and polling traces its 
own pre-history to election straw votes collected by newspapers, dating back at least to the begin- 
ning of the nineteenth century. Often publicity and circulation boosting were more important 
than accuracy of prediction. Converse (1987) points out, however, a more serious journalistic 
base; election polls were taken and published by such reputable magazines as the Literary Digest 
(which had gained a reputation for accuracy before the 1936 fiasco). Then, as now, election 
forecasting was taken as the acid test of survey validity. A reputation for accuracy in ‘‘calling”’ 
elections was thought to spill over to a presumption of accuracy in other, less verifiable areas. 

There was a parallel tradition in market research, dating back to just before the turn of the 
century, attempting to measure consumers’ product preferences and the effectiveness of adver- 
tising. It was seen as only a short step from measuring the opinions of potential consumers 
about products to measuring the opinions of the general public about other objects, either mate- 
rial or conceptual. By the mid 1930s there were several well established market research firms. 
Many of them conducted election polls in 1936 and achieved much greater accuracy than did 
the Literary Digest. It was the principals of these firms (e.g., Archibald Crossley, George 
Gallup, and Elmo Roper) who put polling - election, public opinion, and consumer - on the 
map in the immediate pre-World War II period. 
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Data collection technology developed broadly in the market research and polling organiza- 
tions in this era. Sampling was either by purposively selected groups or by quota. Samples were 
large, with the size enlarged sequentially until the law of large numbers caused the mean or 
percentage being estimated to stabilize. Some questionnaires were very informal with the inter- 
viewer instructed to bring certain topics into a conversation - what we might now call an 
unstructured interview. Others were more standardized, but shorter, actual forms. The pro- 
gression seems to have been that as interviewers became more distanced from the primary 
investigators - in space, in education, in training, in identification with the research project, 
and perhaps in their very numerousness - the interview became more standardized. 

The same kinds of validity issues that interest survey researchers today surfaced in the period. 
What should be the balance between open and closed questions? (Practice seems to have favored 
a combination; the device of the ‘opinion thermometer’’ to calibrate answers was first devel- 
oped by the Literary Digest in 1925.) The pollsters tackled the problem of how to ask sensitive 
questions - about age, income, occupation, and home owning - by providing check lists, func- 
tioning much like contemporary visual aids. Experiments in question wording were carried out 
in the polling houses. 

Market research in this early period, as now, of necessity put a premium on the timeliness 
of results. Then, as now, this tended to create some tension between academics and market 
researchers, with academics believing commercial workers to be corrupted by money and thus 
too far from basic science and commercial workers believing academics were overly concerned 
with the abstract. It is noteworthy, however, that one of the earliest homes of public opinion 
and market surveys was the Psychological Corporation, an organization of academic 
psychologists committed to plowing part of their profits back into the research process. The 
Psychological Corporation carried out its surveys from its Market Surveys Division, organized 
and run by Henry C. Link. 


3.4 The Universities 


But the universities were hardly totally above the polling movement. As early as 1911 the 
Harvard Graduate School of Business established a Bureau of Business Research to carry out 
consumer research. Such household names of social science as Paul Lazarsfeld, Hadley Can- 
tril, and Rensis Likert moved to university affiliations and attached research institutes. 
Lazarsfeld came to the United States in 1933 determined to bring the techniques developed 
in market research to the basic scientific endeavor. He went on to form the Office of Radio 
Research, later to be called the Bureau of Applied Social Research, at Columbia University. 
His myriad contributions included the use of panels and a system of causal analysis. 

Hadley Cantril was an academic who early on collaborated with Lazarsfeld on research on 
radio listening. When the two had a falling out, Cantril established the Office of Public Opinion 
Research at Princeton University. Here studies were carried out to improve data collection tech- 
niques. For example, in investigating the effects of question wording, Rugg and Cantril (1944) 
found that in 1940 - 41 over a six-week period, the percentage of Americans who favored 
‘*siving aid [to Great Britain] even at the risk of war’’ varied between 56% and 78%. At the 
same time, the percent in favor of ‘‘entering the war immediately’’ ranged from 8% to 22%. 

Rensis Likert started out teaching at New York University and with a connection to the 
surveys of the Psychological Corporation. Moving to business, he carried out a survey of life 
insurance agents’ attitudes, comparing qualitative and quantitative (mostly questionnaire) 
methods. He then became Director of the Division of Program Surveys at the Department of 
Agriculture. There he worked to standardize questionnaires. When Likert left the Department 
of Agriculture after World War II, he brought his group to the University of Michigan to form 
the Survey Research Center. 
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4. FROM SAMPLING THEORY TO THE STUDY OF 
NON-SAMPLING ERROR 


As we have seen above, the introduction of probability sampling into government surveys 
in the mid-1930s came at the time of rapid development in many areas of statistics, and the 
development of a foundation for experimentation and inference more broadly under the leader- 
ship of such statisticians as R.A. Fisher, Walter Shewart, Jerzy Neyman, and Egon Pearson. 
Among those who worked on the probability-sampling-based trial Census of Unemployment 
at the Bureau of the Census were Calvert Dedrick, Morris Hansen, Samuel Stouffer, and 
Frederick Stephan (Anderson 1988; Duncan and Shelton 1978). Hansen was then assigned with 
a few others to explore the field of sampling for other possible uses at the Bureau, and went 
on to work on the 1937 sample Unemployment Census. After working on the sample compo- 
nent of the 1940 decennial census (under the direction of Deming), Hansen worked with others 
(e.g., Jerome Cornfield, Lester Frankel, William Hurwitz and J. Steven Stock) to redesign 
the unemployment survey based on new ideas on multi-stage probability samples and cluster 
sampling (Hansen and Hurwitz 1942, 1943). They expanded and applied their approach in 
various Bureau surveys, often in collaboration and interaction with others, and this effort 
culminated in 1953 with the publication of a two-volume compendium of theory and method- 
ology (Hansen, Hurwitz and Madow 1953). The recent interview with Hansen (Olkin 1987) 
and the Duncan and Shelton (1978) volume provide interesting and detailed descriptions of 
the developments during this period. 

Virtually independent and often complementary contributions to sampling theory came 
via the statistical sampling work in agriculture by P.C. Mahalanobis and students in India and 
by Frank Yates and William Cochran in England. Cochran’s 1939 paper is especially notable 
because of its use of the analysis of variance in sampling settings and the introduction of super- 
population and modeling approaches to the analysis of survey data (see Fienberg and Tanur 
1987, 1988 for related discussion on the design and analysis linkages between sampling and 
experimentation). In the 1940s, as results from these two separate schools appeared in various 
statistical journals, we see some convergence of ideas and results. 

The 1940s saw a rapid spread of probability sampling methods to other government agen- 
cies. It was only after the fiasco of the 1948 presidential pre-election poll predictions (Mosteller 
et al. 1949) that market research firms and others shifted towards probability sampling. 
Even today many organizations use a version of probability sampling with quotas (Sudman 
1987). 

Amidst the flurry of activity on the theory and practice of probability sampling during the 
1940s, attention was also being focused on issues of nonresponse and other forms of non- 
sampling error. In a review of work on errors in surveys, Deming (1944) listed 13 factors affec- 
ting the ultimate usefulness of surveys (note that most of these are nonsampling errors): 


. variability in response; 

. differences between different kinds and degrees of canvass; 

. bias and variation arising from the interviewer; 

. bias of the auspices; 

. Imperfections in the design of the questionnaire and tabulation plans; 

. changes that take place in the universe before tabulations are available; 

. bias arising from nonresponse (including omissions); 

. bias arising from late reports; 

. bias arising from an unrepresentative selection of date for the survey, or of the period 
covered; 
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10. bias arising from an unrepresentative selection of respondents; 

11. sampling errors and biases; 

12. processing errors (coding, editing, calculating, tabulating, tallying, etc.); 
13. errors in interpretation. 


Most of the errors described in this list either had been or would become the focus of research 
by statisticians at the Bureau of the Census. 


A milestone in this effort to understand and model non-response errors was the develop- 
ment of an integrated model for sampling and non-sampling error in censuses and surveys, 
in connection with planning for and evaluation of the 1950 census (Hansen, Hurwitz, Marks 
and Mauldin 1951). This analysis-of-variance-like model, or variants of it, has served as the 
basis of much of the work on non-sampling error over the past 35 years, both inside and out- 
side the Bureau of the Census. An excellent qualitative analysis of the error structure of the 
Current Population Survey is given in Brooks and Bailar (1978), and reviews of the non- 
sampling error literature are given by Mosteller (1978) and Fienberg and Tanur (1983). Finally, 
we note that Groves’ (1989) recent book gives an updated approach to a variant of this census 
model, making a careful distinction between random and fixed components that arise from 
the various sources of error. 

The paper by Bailar (1990) in this issue contains a detailed discussion on non-sampling error 
from the perspective of the Bureau of the Census. 


5. CHANGING DIMENSIONS OF SURVEY METHODOLOGY 
AFTER 1960 


The decades of the 1960s and 1970s saw polls and surveys becoming an all-pervasive fact 
of American life, beginning with the hard-fought presidential election of 1960 in which both 
candidates (Kennedy and Nixon) commissioned and relied on private polls of the electorate. 
Here we focus on three major areas of innovation during recent decades. We refer the reader 
to other presentations for such important topics as imputation for incomplete data and the 
ever-present controversies surrounding inferences from survey data (e.g., see Fienberg and 
Tanur 1983, 1986). 


5.1 Mode of Interviewing: The Role of Telephones and Computers in Surveys 


The development and diffusion of technology, especially telephones and computers, strongly 
influenced survey practice in these decades. U.S. telephone coverage, which was estimated to 
have been only 35% in 1936 and hence contributed to the Literary Digest’s problem (Massey 
1988), reached 75% by 1960 and 88% in 1970 on its way to around 93% in 1986 (Thornberry 
and Massey 1988). Thus telephone surveys, often based on random digit dialing (RDD) tech- 
niques, became increasingly prevalent and accurate. The movement began among commer- 
cial survey researchers, with governmental and academics lagging behind because of their 
concerns over differential coverage by such variables as income and race (Trewin and Lee 1988) 
and accompanying fears of lack of ‘‘representativeness’’. Indeed, most government uses of 
telephone interviewing remain as follow-ups of initial in-person contacts (as in the Current 
Population Survey which has been using telephone interviewing for households in later months 
of the survey since 1954). Only recently has there been a marked shift towards the use of RDD 
for government surveys. Groves and Kahn (1979) provide a review of work on telephone inter- 
viewing and, by and large, they document the comparability of survey results through com- 
parisons of data gathered by personal interviews and by telephone. 
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The advent and proliferation of the computer meant that the tasks of analyses of survey 
responses could be carried out much more rapidly and broadly than ever before. This led to 
an increase in the number of surveys carried out under all institutional auspices. In retrospect 
it seems only natural that computer technology should be combined with telephone technology 
to produce systems of computer assisted telephone interviewing (CATI). These systems 
provide automated questionnaires that carry out skip patterns and display the appropriate 
question on a monitor screen, schedule (and often actually place) calls and callbacks, carry 
out randomizations, and automate data entry, in addition to other functions. CATI systems 
were developed by U.S. market research organizations in the early 1970s in part to keep track 
of respondent characteristics and thus ensure that quotas are precisely and efficiently met 
(Nicholls 1988). Chilton Research was one of the commercial CATI pioneers, using a CATI 
system for surveys intended to determine the level of customer satisfaction with services 
provided by the telephone companies (Nicholls and Groves 1986). Largely independently, 
university survey organizations began to develop their own CATI systems in the mid-1970s, 
and introduced them to the larger statistical community with an emphasis on their usefulness 
for documentation, standardization, and interviewer flexibility. While government agencies 
exhibited early interest in CATI, they have only recently begun to actually employ systems, 
sometimes on an experimental basis and often in tandem with other data collection 
methodologies, as in panel designs where the first interview is carried out in person. At this 
writing we see the beginnings of a movement to the use of computer-assisted personal inter- 
viewing (CAPI), a development made possible by the technological advances that produced 
truly portable laptop computers. 


5.2 Longitudinal Surveys 


While panel surveys were conducted in connection with the 1924 and 1940 U.S. presiden- 
tial election campaigns (Rice 1928; Lazarsfeld et al. 1944), interest in over-time survey data 
did not really become fashionable in social research until the 1960s. This is all the more 
surprising when we realize that the Current Population Survey has traditionally had a 
rotating-panel structure and, since 1953, many respondents are interviewed as many as 8 times 
over a 16 month period. This rotating-panel structure was originally intended to produce 
estimates of change in aggregate quantities that had smaller variances than those from repeated 
cross-sections but, in principle, the CPS could have been analyzed in panel form on a regular 
basis. The fact that the CPS is a survey of sample addresses and not individuals or households 
is a major obstacle to the use of it as a panel survey (see related comments on the National 
Crime Survey in Fienberg 1978), but this has not prevented the elaborate use of the CPS to 
study gross flows in individual employment status (e.g., see Abowd and Zellner 1985 and Stasny 
1988). 

Not all survey attempts to measure change need be based on longitudinal data; often repeated 
cross-sections can do at least as well if not better in measuring aggregate change. By the 1970s 
the Gallup Poll and others had developed a tradition of asking the same questions repeatedly 
and reporting the results in newspapers. These established time series became incorporated into 
the burgeoning Social Indicators movement. In 1972 the National Opinion Research Center 
first fielded the General Social Survey (GSS), funded by the National Science Foundation. GSS 
was designed by a broadly based group of academics to provide periodic readings on social 
indicators and to provide an original data set for use by students and academics doing modestly 
funded research. For purposes of continuity, the designers incorporated into GSS many ques- 
tions first developed by Gallup and other commercial pollsters, yielding a fruitful cross- 
institutional collaboration (e.g., see Smith 1975). 
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The basic idea behind the conduct of longitudinal surveys of panels, however, is to measure 
changes over time, not by comparing the changes in aggregate quantities, but by focusing on 
individual change. Such surveys typically focus on changes in status, the duration of activities, 
and events occurring over time. The rise of interest in longitudinal panel surveys occurred 
primarily outside the government, and early examples are the Panel Study of Income Dynamics, 
conducted by the Institute for Social Research at the University of Michigan annually since 
1968; the National Longitudinal Surveys of Labor Market Experience, sponsored by the Center 
for Human Resources Research at Ohio State University beginning in 1966, and currently 
funded by BLS; and the Longitudinal Retirement History Survey, sponsored by the Social 
Security Administration from 1969 to 1979. The 1970s saw expanded use of longitudinal panel 
surveys, especially under government auspices (e.g., see Boruch and Pearson 1988), but the 
basic survey methodology used often resembled that for traditional cross-sectional surveys. 
Only in the late 1970s did researchers begin to question the coventional wisdom about 
longitudinal survey design and analysis and to explore such fundamental issues as the defini- 
tion of a longitudinal family (for a discussion, see Fienberg and Tanur 1986). 

In the 1980s, interest in longitudinal panel surveys expanded and considerable attention was 
focused on aspects of non-sampling error such as attrition and on issues of data management 
and analysis. Kalton ef a/. (1989) includes a number of papers on these topics. 


5.3 Cognitive Aspects of Surveys 


As a result of systematic efforts to improve survey methodology over the past forty years, 
survey researchers have evolved a highly developed art of questionnaire design and interview 
procedures to reduce nonsampling errors, such as those described in Deming’s list above 
(e.g., see Payne 1951), and they have carried out many scientific studies to test aspects of that 
art (e.g., see Sudman and Bradburn 1974, Bradburn and Sudman 1979, and Schuman and 
Presser 1981). Until recently, however, research on understanding the survey interview situation 
has been relatively unsystematic. The recent change came, in part, through the recognition that 
other fields, in particular cognitive psychology, had insights that would assist survey resear- 
chers in examining the interview process. 

Among non-sampling errors are those occasioned by the cognitive processes that respondents 
and interviewers are required to exercise in the survey interview situation. Respondents must 
often recall events and make judgments or estimates, and must always face issues of comprehen- 
sion of the questions asked - their meaning to respondents as well as their meaning to inter- 
viewers. Survey researchers are now beginning to draw on the concepts of cognitive psychology 
and the expertise of cognitive psychologists to investigate more systematically these issues of 
non-sampling error. We note especially that the exploration of meaning is not new to the enter- 
prise of survey research. Indeed, Cantril (1944) devotes two chapters to reporting the results 
of experiments on the meaning and wording of questions. These experiments used many of 
the same probing and paraphrasing techniques used in today’s cognitive laboratory. 

This explicit movement to study cognitive aspects of surveys originated in a 1981 conference 
sponsored by the Bureau of Social Science Research and the Bureau of Justice Statistics that 
brought together cognitive psychologists and survey researchers to concentrate on the National 
Crime Survey. A more intensive 1983 conference, sponsored by the Committee on National 
Statistics (CNSTAT) of the National Research Council, concentrated on the National Health 
Interview Survey (Jabine et a/. 1984). From the beginning the movement was, by design, a part- 
nership between people from academia, from research institutes and other academic institu- 
tions, and from the government. 
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A direct outgrowth of the CNSTAT conference was the establishment of a Questionnaire 
Design Research Laboratory at the U.S. National Center for Health Statistics under the leader- 
ship of Monroe Sirken to do pretesting (in parallel with full scale field testing) of major govern- 
ment surveys. It employs government personnel, brings in visiting scholars, and contracts with 
academics and people in research institutes to carry out its mission. This has been followed 
by the establishment of similar laboratories at the Bureau of Labor Statistics and the Bureau 
of the Census. Another outgrowth is the establishment of the Social Science Research Council’s 
Committee on Cognition and Survey Research, which is, itself, both cross disciplinary and 
cross institutional. The Committee has fostered research in such directions as the interactive 
process of the survey interview, the uses and pitfalls of retrospective memory, and issues in 
measuring pain in a survey context. Examples of other outgrowths of this movement are (a) 
an investigation by the ORCD’s Working Party on Labor Statistics of cognitive aspects of labor 
surveys, addressing such issues as the meaning of ‘‘looking for work’”’ - a knotty conceptual 
problem within a culture, and even more problematic across cultures (Schwarz 1987), (b) work 
at combining the cognitive perspective with statistical work on the embedding of experiments 
within surveys (Fienberg and Tanur 1989), (c) international conferences on work at the inter- 
face of cognition and survey methods (e.g., see Hippler, Schwarz and Sudman 1987). 

At the same time that methodological techniques of the cognitive laboratory are being used 
to shape questionnaire design, findings from the cognitive psychology laboratory are being 
taken into the field in order to test their generalizability and thus enrich the academic field 
of cognitive psychology, as well as to ascertain their usefulness for the survey enterprise. Here 
is yet another instance of interaction between the academic world and the government. For 
example, a laboratory finding is that people recall visits to health care providers more easily 
and accurately if they begin with the earliest first (Fathi, Schooler and Loftus 1984). A recent 
investigation explores whether this advantage holds in the field situation of the pre-test of the 
NHIS (White and Berk 1987). 

The movement to integrate methods from the cognitive sciences into the design of sample 
surveys is important for several reasons. First, it has brought a renewed scientific base to the 
problems of questionnaire design. Second, it has opened up the survey domain to the study 
of selected cognitive phenomena. But most important, it had brought new vigor to the survey 
enterprise and raised anew issues about the structure and format of the survey interview, going 
far beyond questionnaire design, that many statisticians thought were resolved in the 1940s 
and 1950s. 


6. COMMENTS 


Traditional reviews of the history of survey methods have focused on the role of probability 
sampling and its refinements, and occasionally on the study of non-sampling errors. Here we 
have attempted to set this methodological history in the context of the tradition of social science 
research that evolved over the nineteenth and early twentieth centuries and the institutions, 
in and outside of government, that facilitated and occasionally directly spawned the 
methodological developments. This perspective should help remind readers that factors other 
than the advance of statistical theory have helped to shape the survey domain as we know it 
today. It should also help them follow the evolution of survey theory and practice as it con- 
tinues to be shaped by institutional change. 

There is an additional facet of institutional shaping of the survey enterprise that we have 
not addressed heretofore. We wrote above about the permeability of the membranes separating 
the three sectors: government, market research (the private domain), and the universities and 
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other academic institutions. We believe that these membranes are becoming even more 
permeable with the increased presence of a fourth kind of institution, which we shall refer to 
as a “‘bridge’’. We saw earlier how the ASA-SSRC Committee on Government Statistics and 
Information Services, a bridge between academia and government, prepared the ground for 
federal statistical coordination. ASA and SSRC continue to provide bridging functions, but 
other such institutions also exist. 

Some vivid examples of other bridges come to mind. For over 40 years the American 
Association for Public Opinion Research has been bringing together Survey practitioners from 
all sectors in local chapters and in national conferences at which new findings are disseminated 
and issues of common concern are discussed. The National Science Foundation program on 
Measurement Methods and Data Improvement (MMDI), under the direction of Murray 
Aborn, has explicitly seen as part of its mandate the fostering of government/academic 
collaboration. The mission has been implemented, for example, through the funding of 
research by academics that both uses and improves government databases (the 1983 seminar 
on cognitive aspects of survey methodology was sponsored by MMDI) and the funding of 
an ASA-sponsored fellowship program. That fellowship program places academic researchers 
for a semester or a year in government statistical agencies to carry out their own research, 
bring new ideas to the agency, and return to their academic bases with new knowledge and 
contacts in the federal agencies and new awareness of government data bases and statistical 
concerns. The National Research Council, an arm of the National Academy of Sciences, 
maintains a Committee on National Statistics that brings statisticians from academia and the 
private sector together to interact with representatives of the government agencies. Here, in 
formal panel studies and informal interaction, individuals come to know one another and 
common problems are tackled. 

While these and other bridges will surely not totally erase the boundaries between the sectors, 
we see their existence as a positive force for progress in the development of survey method- 
ology. Developments in one sector move more quickly to others across these bridges, but 
perhaps more important, the bridges facilitate a process whereby problems faced by any sector 
become legitimate research questions in all sectors. 
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COMMENT 


ROBERT M. GROVES! 


The writing of histories of the development and use of survey methods signals a certain 
maturation of the field. Currently, we are seeing the fiftieth anniversary of several important 
survey innovations - Neyman’s breakthrough papers in stratification, the start of the U.S. 
Current Population Survey, and the greater visibility of election polling. With our attention 
called to such developments it is natural to review the intervening years, seeking to find some 
theme for events affecting the field. Professors Fienberg and Tanur have completed such an 
exercise in their paper. 

My comments will review key parts of the work, offering comments as I proceed, and then 
note some errors of nonobservation, misplaced emphases, and other minor quibbles. 

Fienberg and Tanur express the purpose of their paper in two ways ‘‘to note that technical 
developments in surveys can be understood only in the context of institutions within which 
they occur ’’(p. 31) or at another point to note that ‘‘factors other than the advance of statistical 
theory have shaped the survey domain”’ (p. 42). Consistent with this they note: 


1. the role of ruling, governing institutions which perceive a need for information on the 
population’s welfare or its reaction to taxation; 


2. later, the role of academics in the social sciences in framing central statistical and measure- 
ment issues in surveys; 


3. the role of mass media use of surveys for election and current events monitoring; and 


4. still later, the use of surveys by commercial entities in the market economy. 


They document the resolution of controversies in the government sector about use of probability 
sampling. 

Along the way we learn some interesting facts - for example, that for 12 U.S. censuses 
(120 years) there was no permanent organization for the Census Bureau; that the Department 
of Agriculture data collection began with need for information about food supplies in the Civil 
war; that another boost for surveys occurred in the New Deal’s creation for government pro- 
grams. There seems to be a recurring theme here that governments emphasizing services for 
the welfare of the populace demand more information about their societies than do those 
pursuing other goals. In addition, we see that governments most sensitive to public opinion 
demand more measures of that opinion (that reminds this reader of Gallup’s early metaphor 
of the survey as a voting analog). 

The focus in the paper on the role that institutions played in the development is convincing 
only for parts of the review. For example, the institutional focus is appealing in describing 
Lazarsfeld’s evangelical efforts to bring commercial survey and academic inquiry together. 
The role of the Bureau of Applied Social Research at Columbia University in his partial success 
at that is enlightening. So too the move of Likert and others from a government agency home 
(U.S. Department of Agriculture) to academia in order to spread the method to new domains 
is largely a story of groups of people and organizations which make them effective. 

However, the identification of organizations or institutions as the focus can be 
misunderstood as the stimulus to developments. Nothing I read in the paper changes my opi- 
nion that the survey field at its origins attracted broad, creative thinkers. Many were intelligent 
and charismatic; they led by ideas and mobilized others to work diligently at the definition 
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of the new field. Institutions permitted this to happen. They didn’t produce the developments. 
They were homes for the best and brightest. 


Within the focus on the institutional, I wished the emphasis of the paper might have been 
placed more on two related points: 


1. Different tasks were more easily accomplished in the different domains. For example, 
government agencies were by their nature restricted to questions of monitoring social welfare, 
the commercial, to newsworthy or dollar worthy interests, and the academic to longer term, 
more basic social issues. Those involved in early developments shaped their agenda to the 
goals of the organization. 


2. Stories of the early days of survey research, as told by those who lived them are filled with 
the excitement of a new field. I missed in the paper sufficient acknowledgement that the 
young researchers involved in the work shared an evangelical mission - spreading the gospel 
of probability sampling, inventing new methods of interviewing because nothing existed. 
The institutional focus misses the human drama of those days. 


Fienberg and Tanur also note ‘‘the membranes separating the institutions are extremely 
permeable’’. That is, researchers move back and forth between the institutions, contributing 
to each of them, and transferring knowledge as they move. The evidence the authors cite is 
the experience of Lazarsfeld addressing basic design issues while conducting radio audience 
research within an academic setting and of Likert moving from the insurance industry to the 
Department of Agriculture to the University of Michigan. These moves seem the exception 
rather than the rule. I have not conducted the appropriate careerline research to demonstrate 
this, but my impression is that the fences between the sectors have been and remain high and 
painful to transgress. Further, movement among academic government, commercial is asym- 
metric. Rarely is there movement from the commercial or government sector to the academic 
sector (current demands on publication history prevent this). The government-commercial inter- 
change is larger. 


The result of this insularity is the development of techniques not shared across the different 
sectors (edit and imputations schemes, nonresponse reduction techniques). The three sectors 
to some extent have developed their own language to describe their work (e.g., ‘‘stem and 
banners’’, ‘‘tabs’’ versus ‘‘contingency tables’’). 

The membrane metaphor also fails to observe the large differences in the centrality of surveys 
to organizations in the three sectors. Academic survey research is not central to any university 
in the world. It was not central early in the history of the method (viz. the inability of the Likert 
group to obtain university parking stickers because of their nonfaculty status). Even now it 
is often viewed as a haven of technicians (several steps below the chemistry laboratory staff) 
currently on many campuses. In contrast there are government and commercial organizations 
fully devoted to survey design, collection, and analysis. These have decision-making hierar- 
chies constantly monitoring cost and error structures of surveys without the ongoing debate 
about the relative worth of the enterprise. 


The paper ends with a discussion of three developments since 1960 that are important to 
understanding surveys. At this point, the institutional context is dropped as the organizing 
principle of the paper and innovations are the focus. Three developments are highlighted: 
a) the use of the telephone as a data collection medium and later developments in computer 
assisted telephone interviewing (CATI); b) the use of longitudinal surveys to study micro- 
level change over time; and c) the application of cognitive psychological concepts to survey 
methods. 
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The authors note the movement of mode of data collection from face to face to phone and 
development of CATI, but they fail to note that this is largely a US phenomenon in the academic 
and government sectors (the commercial side had done it years ago). Indeed, it is an example 
of distinctive methodologies pursued by the three sectors. I share their belief that the merits 
of longitudinal surveys are increasingly being recognized and note that the 1980’s is seeing this 
spread internationally. The Fienberg and Tanur team was instrumental in launching the U.S. 
effort the apply cognitive psychological concepts to survey measurement, and we are in their 
debt for this. . 

The paper does not make it clear whether the authors believe the CATI, longitudinal surveys, 
and the effort to “‘cognitize’’ survey methodological research are the most important three 
developments in surveys, but they clearly omit several other important ones. We can all choose 
our three most important developments since 1960; here are some other candidates: 


1. Development of Generalized Statistical Software Packages 


This development greatly expanded the number of researchers who could directly pose and 
answer questions using survey data. In the statistical and social sciences at this writing, it is 
common for undergraduates to perform analyses of survey data whose complexity would have 
prevented their being done 25 years ago. 


2. Existence of Survey Data Archives 


The archiving of survey data on computer media was a further democratizing force in survey 
analysis. With those developments replication and extension of analysis, a key component of 
the structure of scientific advance, became trivial. Unfortunately, there were also deleterious 
effects. Analysts of survey data could do their work in complete ignorance of the survey design, 
of the interviewer training and supervision guidelines, of nonresponse rates, and of a host of 
other design features known by those conducting the survey. 


3. Growth of Commercial and Nonprofit Industry to do Government Surveys 


The U.S. is distinctive in its reliance on academic and commercial groups to conduct surveys 
on behalf of government agencies. Some of this exists in many Western countries, but to a 
much smaller degree. This suggests that a cross-cultural strain in the paper might be interesting - 
to identify unique histories of survey research in various societies. 


4. 1960 as Beginning of Widespread Acceptance in Academic Circles of the Social 
Psychological Model of the Interview 


This typically describes survey interviews as “conversations with a purpose” and focuses the 
researcher’s attention on the role of the two actors in the errors produced during measurement. 


5. Ubiquity of Surveys 


Survey measurement is now a way of life for most large corporations (prior to the breakup 
of ATT in the U.S. the corporation conducted over 7 million customer satisfaction interviews 
annually). Surveys are viewed as irreplaceable sources of information about customers, sup- 
pliers, and the general society. 


6. Nonresponse and the Growing Reluctance of the Population to be Measured 


This is certainly a phenomenon of great import to survey researchers in most Western coun- 
tries. With statistical inference to large populations one of the key virtues of surveys versus other 
data collection schemes, this issue strikes at the heart of the tool. Again, a cross-national theme 
to the paper would have highlighted these issues. 
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We can apply the superpopulation metaphor to any historical account - that is, any series 
of events (which later we call history) is but one realization of an infinite set of possible series 
which defines the universe of possible realities. This fits the set of questions that remain 
unanswered. 


1. Why after almost a century hasn’t survey research fully evolved into a profession (with 
specified standards and training criteria)? 


2. Why is there so little formal educational structure for survey researchers to get their knowl- 
edge base? Why are there departments of communications, operations research, naval 
architecture but none of survey research (teaching sampling, questionnaire design, data 
analysis)? 


3. Would public education about surveys and statistics (like the ASA/NSF program in quan- 
titative literacy) have made an impact on acceptance of surveys? 


We are indebted to the Fienberg/Tanur team for reviewing our collective past. They have 
helped chronicle the birth and first 50 years or so of what is now an important component in 
most societies of the world. I do hope that the year 2040 will see the need to ask Fienberg and 
Tanur to update their paper for that occasion. I hope they will be able to report innovation during 
those 50 years that made a difference in survey methods. 
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ABSTRACT 


Drawing upon experiences from developments at the U.S. Bureau of the Census, the paper briefly traces 
some contributions made by practitioners to the theory and application of censuses and surveys. Some 
guesses about future developments are also given. 
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1. INTRODUCTION 


In the United States, the federal government has led the way in the development of statistical 
methodology in censuses and surveys. I will confine my remarks to examples from the U.S. 
Bureau of the Census and will discuss four main areas of work - the development of sampling 
methods, non-sampling error, seasonal adjustment, and the development of methods to protect 
the confidentiality of respondents, usually called disclosure avoidance techniques. Finally, I 
will venture to hazard some guesses about future development. 


2. SAMPLING 


The story of sampling in the U.S. federal government is primarily the story of a remarkable 
group of people at the Census Bureau, led by Morris Hansen and William Hurwitz. When one 
considers that the Census Bureau was committed to probability sampling in the early 1940’s, 
one wonders: how could an innovation of this type have occurred so quickly in such a conser- 
vative institution? The adoption of innovative methods often takes a very long time and I 
suspect the Bureau is much slower in adopting and promoting new methodology today. Hansen 
has given three reasons why he thinks sampling was accepted relatively quickly by the subject- 
matter divisions of the Bureau. They are: (1) support from the top, (2) conscious development 
of a team-work approach with the subject-matter divisions, and (3) the development of a corps 
of sampling experts (later, methods specialists) in the subject-matter divisions who were respon- 
sible to the Statistical Research Division (SRD) on technical matters. I think he left out one 
key ingredient and that is the force and the spirit of the dynamic duo and their cohorts. 

In 1936, the Bureau began exploration of sampling and potential applications. Some 
sampling was already in use, but not probability sampling. There was judgment sampling and 
sampling of some large establishments. However, there was little or no theory to guide sampling 
approaches. In 1937, Congress authorized a national voluntary registration of the unemployed 
and partially employed. A questionnaire was to be delivered by the Post Office to every 
household. There was some concern that this voluntary registration could have some bias, so 
an enumerative check census was put in place in a sample of areas. The check census required 
interviewing all households within a probability sample of postal delivery routes. The mail 
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carriers did the interviewing and identified and sorted the voluntary mail returns. They then 

provided separate counts for each postal route, including the sample postal routes. This then 
gave an independent variable to use in the estimation, one of the earliest demonstrations of 
ratio estimation. The results of the check census were convincing on the usefulness of sampling. 
However, the entire effort was remarkable in many ways: 


e the effects of nonresponse from a voluntary census were anticipated; 
e the use of ratio estimation; 


e the speedy results. 


Hansen, in an interview in Statistical Science (Olkin), reports that the registration took place 
the week of November 20, 1937; that the household canvas was done during the week of 
December 4, 1937; and preliminary results became available on New Year’s Eve, 1937. I don’t 
think the Census Bureau could beat that record now. 

Hansen attributes the success of the 1937 enumerative check census as a demonstration of 
the use of sampling as key in gaining acceptance within the Bureau. Before then, Bureau staff 
believed that complete coverage was necessary and that sampling would discredit the Bureau. 
The success of the study helped gain the acceptance of sampling in the 1940 census, the first 
census in which some questions were asked of only a sample, not the entire population. Unfor- 
tunately, in the last few months, some at Census have dragged out the old chestnut about 
needing to do the vacant delete check on a 100% basis because a census has less error than 
a survey. Let’s just assume that was a temporary aberration caused by litigation. 

A great deal of the theory of sampling was developed in conjunction with the Labor Force 
Survey. The Works Progress Administration (WPA) sponsored a survey to measure unemploy- 
ment. In 1942, when the WPA was abolished, the survey was moved to the Census Bureau. 
The sampling procedures were evaluated and many improvements were made. Several impor- 
tant contributions to sampling theory came from that revision. Some of the sampling prin- 
ciples introduced into the 1942 revision were: enlarged primary sampling units, sampling with 
probabilities proportionate to a measure of size, and area substratification. These principles 
were discussed in a 1943 paper by Hansen and Hurwitz in the Annals of Mathematical Statistics. 
Rereading this paper, ‘‘On The Theory Of Sampling From Finite Populations,”’ always pro- 
vides new insights. The article seems to be the first published by federal employees on the topic 
of sampling of finite populations. Though the concepts had been discussed by others, the 
extension of theory was new. Also, a hallmark of Hansen and Hurwitz, the results were 
discussed in a series of practical comparisons highlighting the advantages of the recommended 
procedures. 

Improvements in the Labor Force Survey continued over the years. Composite estimation, 
using the system of sample rotation to improve the estimates, was introduced. The Current 
Population Survey, as the Labor Force Survey is now called, has undoubtedly led the way 
throughout the world in setting the standards for a labor force survey. 

Surveys of business establishments presented new sampling problems, also undertaken by 
the Statistical Research Division. The attitude frequently encountered was that sampling might 
be all right with relatively homogenous populations such as people but they would not work 
with highly skewed populations such as businesses. Working with the acknowledged skewness 
of the population, the sampling group stratified the retail stores by size. The largest stores were 
necessarily included in the sample, and the smaller businesses were sampled with probability 
proportionate to a measure of size. 

It was also apparent that businesses came into being and died frequently. A static sample 
would not be able to capture this turnover. Therefore, an area sample to provide estimates 
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for new stores was incorporated. The Monthly Retail Trade Survey has seen many innova- 
tions, but these basic cornerstones remain. The Retail Trade Survey also makes use of com- 
posite estimation to provide more precise estimates. 

Many other instances of sampling innovations could be mentioned. Many descriptions are 
given, and the theory and practical applications are described in the book Sample Survey 
Methods and Theory in two volumes, by Hansen, Hurwitz, and Madow (1953). Though the 
illustrations are seriously outdated, the books still provide more practical sampling applica- 
tions than any other books I know of. I only regret that they were never updated. 


3. NON-SAMPLING ERROR 


Another major advance in sample surveys and censuses was to look beyond sampling error 
to try to control the errors arising from other sources, such as the interviewers, processors, 
questionnaires, and so forth. Hansen and Hurwitz moved in that direction before the 1950 
Census, incorporating many experimental studies in the census designed to estimate the effect 
of measurement errors in the census. Total survey error became a strong focus at the Census 
Bureau. The measurement and control of nonsampling errors became a regular feature of 
Census Bureau work. 

An impetus to this nonsampling error work was the recognition that measurement errors 
could have a much stronger effect on data than sampling errors, especially at larger levels of 
aggregation. Hansen, Hurwitz and Bershad (1961) developed an integrated model for censuses 
and surveys that explicitly incorporated sampling error, response error, and bias. The response 
error component contained what are now known as a simple response variance and a correlated 
response variance. The simple response variance reflects the basic trial-to-trial variability that 
arises from differences in respondent reporting, different respondents, different interviewers, 
and the like. The term has also been generalized to include the variance that arises from trial- 
to-trial variability in coding. The correlated response variance refers to the variance that arises 
from a factor that pushes responses into a certain pattern. The most studied factor is that of 
the interviewer. By having certain expectations or from experience interviewing at a few 
households, the interviewer can push responses into certain categories. We see wide variability 
among interviewers working in the same areas on nonresponse rates, on questions about educa- 
tional attainment, and many other items. 

This model was first tested in the 1950 census and was a major factor in the decision to move 
from an ‘‘enumerator census’’ where an interviewer went to every household, asked the ques- 
tions, and recorded the answers, to a ‘‘mail census’’, where the questionnaires are sent to every 
household and householders are asked to fill out the forms and return them by mail. 
Experiments in the 1960 and 1970 censuses show a large reduction in this variance component 
when self-enumeration is used (U.S. Bureau of the Census 1968, 1970). 

In addition, Hansen and Hurwitz encouraged work on coverage error. The Census Bureau 
has invested a large amount of time in investigating the effects of coverage error, both in 
censuses and surveys. After the 1950 census, using a model developed by Ansley Coale at 
Princeton University, the Census Bureau was able to measure the amount of undercounting 
in the decennial census at the national level, by age, race, and sex. This method, known as 
demographic analysis, showed that there was a differential undercount that affected blacks 
much more severely than whites (Citro and Cohen 1985). In addition, the Census Bureau 
started development of a post-enumeration survey to learn more about the uncounted popula- 
tion. At first, the Bureau relied on a ‘‘do-it-better’’ approach, but in recent years has turned 
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to a ‘‘do-it-again’’ approach. This latter emphasis will be used in the 1990 census. Similarly, 
coverage losses in surveys spurred work on ratio estimation procedures that would dampen 
the effect. Most Bureau household surveys use those procedures. 

The Bureau of the Census now is well known for its work on measurement error. In addi- 
tion to work on response error and coverage, it has encouraged work on time-in-sample biases 
that affect the estimates from surveys in which respondents are contacted more than once. The 
labor force survey, in which respondents are kept in sample four successive months, dropped 
for eight months, and then contacted for four additional months, has been carefully studied. 
Bailar (1975) showed the difference between the higher estimates of employment and unemploy- 
ment for those in sample for the first time and those in sample for later times. These differences 
affect the levels of employment and unemployment, though probably not the estimates of 
month-to-month change. 

These are only a few examples of the work begun at the Census Bureau on measurement 
errors. Now work is carried on at all the statistical agencies. 


4. SEASONAL ADJUSTMENT 


The history of seasonal adjustment in the government began with the efforts of Julius Shiskin 
when he was at the Census Bureau. He was responsible for introducing computerized seasonal 
adjustment. Now the X-11 method is used around the world. 

According to Julie Shiskin, in the 1950’s the Federal agencies were under pressure from the 
Council of Economic Advisors to produce seasonally adjusted time series. The Census Bureau 
got the first electronic computer dedicated to data processing, the UNIVAC I, in 1953 and Julie 
heard a lot about how difficult it was to program from Eli Marks who was in his car pool. 
It dawned on Julie that the computer could be used for making the seasonal adjustments, so 
he checked with a computer technician and found that it would take 1 minute to do a 10-year 
series. Of course, it takes less than that now. 

Seasonal adjustment is still somewhat of an art form, since the X-11 program provides so 
many options, and the analyst can choose among them. However, there was skepticism at the 
beginning of this computerization about whether a machine could do what a skilled techni- 
cian could. Julie decided to challenge the Federal Reserve Board. He proposed that they take 
any series and spend as much time as they wanted adjusting it. Then he would run the same 
series through the computer. Both series would be plotted and given, without identification 
of who did the adjustment, to a small, very distinguished group at the Federal Reserve Board 
who would judge the results. The result was a unanimous decision that the computer method 
was superior. 

The government now seasonally adjusts thousands of time series annually. Model-based 
methods, because of computer limitations, seemed impractical for many years. Also, new sea- 
sonal adjustment factors were developed every year, based on historical experience. For 
example, a factor to be used in the computation of the seasonally adjusted figures for July 
would be developed in December of the preceding year. No new data based on more recent 
events were allowed to influence the adjustment. This made sense when it took several days 
to prepare punch cards and run the series. But within the last ten years, that method received 
more criticism and the method of concurrent seasonal adjustment was promoted. The time 
series staff at the Census Bureau, led by David Findley, dia a thorough investigation of the 
merits of concurrent seasonal adjustment on Census Bureau series, and led the way for the 
adoption of that method by the Bureau. 
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The time series staff has also asked some very key questions that are central to seasonal 
adjustment. First, what kind of standard exists to judge whether or not a series should be 
seasonally adjusted? Second, given that there are several methods for adjusting time series, 
how do you evaluate the different methods? In a key paper, Bell and Hillmer (1984) question 
the need for seasonal adjustment if series can be adequately modeled. They also describe some 
criteria for evaluating seasonal adjustments. I must be quick to point out that the Census Bureau 
is not the only government agency that has done ground-breaking work in this area. In fact, 
one very useful accomplishment of the time series staff at the Census Bureau is to hold regular 
meetings of interested and involved experts throughout the government. Thus, people at the 
Federal Reserve Board, Bureau of Labor Statistics, Energy Information Administration, and 
the Bureau of Economic Analysis, to name only a few, all participate and keep up-to-date on 
new developments. Estella Dagum at Statistics Canada has led many very successful efforts, 
including the development of the X-11 ARIMA method. 


5. DISCLOSURE AVOIDANCE 


Whether or not one agrees with the Census Bureau on its policies about keeping data 
confidential one must agree that the Bureau has promoted disclosure avoidance techniques 
to protect data. Disclosure avoidance is an attempt to protect the answers of individual 
respondents. It has long been a problem in censuses, but is also a problem in surveys, especially 
surveys that are longitudinal in nature or where records exist that could be linked to the survey 
results. 

Disclosure avoidance problems in the population censuses focus on disclosures that would 
occur from the publication of very small frequencies. These small numbers lead to the poten- 
tial identification of single respondents or small groups of respondents. In addition, zeros in 
cells may also lead to disclosure. Disclosure in frequency tables is usually defined in terms of 
a threshold rule that states that disclosure occurs if, given any tabulation cell_X, one can infer 
that the number of respondents in_X is less than a predetermined threshold value. In 1980 decen- 
nial census publications this predetermined threshold value was defined separately for 
households and persons. 

Methods for controlling disclosure in frequency count tables fall into three categories: sup- 
pressing all values, perturbing cell values, and replacing numeric cell values by intervals. Cell 
suppression insures that numeric values are not given and that inferences cannot be derived 
from manipulation of linear relationships between unpublished and published cell values. Data 
perturbation means adding or subtracting a small amount from most cell values so that infer- 
ences regarding the tabulated values cannot be made with certainty. The third method, replacing 
point estimates by intervals, is not useful for many data users for cross-classifications. 

Cell suppression was the main technique used by the Census Bureau through 1980. Additive 
restraints along rows and columns of the table generate a series of linear constraints. Once the 
primary disclosures have been suppressed, mathematical programming is used as a disclosure 
audit on the table. Though this method was used on an ad hoc basis for years, Cox and his 
colleagues at the Census Bureau derived the mathematical underpinnings (Causey, Cox and 
Ernst 1985) and showed how complex cell suppression actually was. 

Data perturbation methods, including random rounding, have been developed and used in 
the United Kingdom, Sweden, and Canada. All of these methods depend on adding or sub- 
tracting a small value, sometimes zero, from table cells, with a specified probability. 

For data such as sales, value, inventory, and financial information from manufacturing and 
retail establishments, the Census Bureau is concerned about being able to identify the amount 
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from respondents. If a competitor reviews a tabulation and subtracts the amount for his firm, 
the amount for another respondent may be identified. Cell suppression techniques are used. 
The so-called (, k)-rule states that X is a disclosure cell if a fixed number of respondents n 
account for more than a fixed percentage of k of the total cell value. This rule belongs to a 
class of cell dominance rules, all of which are additive. 

Disclosure avoidance work is going on all over the world, primarily in government offices. 
No doubt this reflects the fact that these offices have serious problems that have been pushed 
to the fore by the demand for microdata. 


6. A LOOK TO THE FUTURE 


All four areas presented so far have relied on the development of mathematical models. 
Sampling, of course, relies on randomization methods, but the control of total survey error 
led to the formulation of a survey error model, first described by Hansen, Hurwitz, and 
Bershad (1961). That model and the experiments used to estimate the parameters were the basis 
for many policy decisions on the conduct of censuses and surveys. 

Time series models are used widely around the world, replacing empirical methods such as 
the X-11. Researchers are now urging that time series methods become integrated with survey 
estimation methods to produce more accurate results. It will be interesting to observe how or 
whether this melding will take place. 

Another area of active modeling within government agencies is to produce small-area data. 
Data are often collected for larger areas of aggregation, such as states, and then data needs 
are expressed for smaller areas, such as counties. Conferences have been held comparing and 
evaluating different techniques for producing small-area data. The Census Bureau used 
empirical methods to develop population estimates during the decade. Several models were 
explored as part of the undercount research at the Census Bureau, and much was learned about 
the problem. 

Ad hoc methods for editing and imputation are now being carefully scrutinized and 
mathematical models are being developed. We shall undoubtedly see more modeling of this 
type in the future. 

Thus, the future, as I see it, will be a further expansion of models. This is not to denigrate 
the empirical methods used now. Statisticians have always recognized that theory and prac- 
tice go hand in hand. Empirical methods that seem to work lead to modeling and theoretical 
developments that are tempered by practical experience. The government agencies have many 
fascinating statistical problems that will lead the way, as they have in the past, in certain areas 
of statistical methodology. 
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COMMENT 


G.J. BRACKSTONE! 


1. Introduction 


This paper confirms the significant contributions to statistical methodology made by the 
Bureau of the Census over the past 50 years. The four examples chosen by Bailar to illustrate 
these contributions are striking, not only in their intrinsic importance, but also in their variety. 
These are not variations of a single methodological breakthrough; they are fundamental con- 
tributions in four distinct areas. They are perhaps themselves illustrative of the wide variety 
and challenging nature of methodological problems faced by government statistical agencies - 
a variety and level of challenge that belie any suggestion that government statistics involves 
only the routine and the mundane. 

Of particular interest in the description of these examples are the insights into the 
environments in which these developments came about. While the methodological contribu- 
tions have themselves yielded benefits far beyond the original problems they were designed 
to address, the processes that led to these original contributions are themselves worthy of 
attention to identify the circumstances that need to exist to make such breakthroughs possible. 
I will return to this theme below. 

During this same period, the Bureau of the Census was also making significant contribu- 
tions to the automation of statistical processes. Having pioneered the development of punched 
card sorting and tabulating equipment in the earlier part of the century, the Bureau of the Census 
was responsible for the introduction of the first computer into a statistical agency in the 1950s. 
Subsequently in the 1960s, the Bureau also led the way in the automation of data entry by 
developing FOSDIC, a device for reading a microfilm copy of a marked questionnaire. Clearly 
the innovative contributions of the Bureau of the Census permeate many aspects of the work 
of government statistical agencies. 


2. The Diffusion of New Methodology 


Each of the contributions to statistical methodology described by Bailar originated with a 
real practical problem faced by a statistical agency. The need to collect additional data at 
reasonable cost and with acceptable timeliness motivated the development of sampling methods; 
the need to improve data quality by understanding, measuring and reducing non-sampling errors 
led to work in this area; seasonal adjustment developments seem to have been prompted by 
a need to speed up and standardize a skilled manual procedure; the problem of defining a 
rational and efficient process for ensuring the confidentiality of individual information in 
statistical outputs inspired the research on disclosure avoidance. Each of the many other 
examples that could have been cited share this characteristic of having had a real practical 
problem as catalyst. 

The successful development of statistical methodology to address problems such as these 
is clearly of direct benefit to the statistical agency involved. But have these contributions had 
benefits more broadly? Have they added to the body of knowledge and methodology known 
as Statistics? It will be argued that these developments have had significant and broad benefits 
to statistical agencies engaged in the production of social and economic data, but that their 
impact on the subject of Statistics as treated in universities, while growing, has not been as 
influential as it might have been. 
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Firstly, consider other government statistical agencies. In most countries the government 
statistical agency is a unique organization dealing with the problems of running regular large 
household and business surveys, integrating data from various sources, maintaining and 
analyzing time series, and making large volumes of data available to the public. (In this respect 
the United States is an exception in having several major organizations involved in this type 
of activity in different subject areas.) In most countries, therefore, statistical agencies have 
to look abroad for experiences similar to their own and for peer discussion and review. The 
network of interaction between government statistical agencies is extensive among developed 
countries. Contacts may be bilateral or multilateral. The long-standing and continuing exchange 
of information and experience between Statistics Canada and the U.S. Bureau of the Census 
is an example of the former. Statistics Canada has benefitted greatly from being able to adopt, 
and in some cases extend, statistical methodologies developed at the Bureau of the Census, 
including all of those described by Bailar; equally, I believe, the Bureau of the Census has 
benefitted from methodological developments at Statistics Canada. 


On the multilateral level, several organizations provide regular fora for the exchange of infor- 
mation between statisticians in government agencies. These include the United Nations and 
its regional and specialized bodies, the International Statistical Institute, particularly its sec- 
tions for Survey Statisticians and Official Statistics, and the professional statistical societies 
of several countries. In addition, both U.S.B.C. and Statistics Canada have instituted annual 
symposia or research conferences at which new developments and experiences are exchanged. 
All in all, this mixture of bilateral and multilateral contacts serves well to ensure that contribu- 
tions to statistical methodology emanating from any agency - and many agencies are making 
significant contributions - are freely shared and utilized in other agencies. 


But what has been the impact of such developments on the statistical profession outside 
government statistical agencies? Here we will use the specific examples cited by Bailar for 
illustration, though there are many other areas (some of them listed in Section 4) for which 
similar arguments would apply. In the case of sampling, the influence on the profession has 
been far-reaching. The topic of sampling from finite populations is now an established part 
of many university statistics curricula and is the subject of numerous textbooks. The 
developments initiated in a government statistical agency have been absorbed and extended 
by the profession. Indeed, some might argue that they have in some respects been taken far 
beyond the practical needs of survey- takers. In the case of non-sampling errors, the story is 
different. These developments have not yet led to a well-established body of theory and 
methods. That is not to say there have been no developments. On the contrary, there has been 
a wealth of work. However, much of it has been survey specific. It has improved, one hopes, 
many individual surveys, documented a great deal of experience, and generated a certain 
amount of applicable wisdom. But the topic has not yet found a secure niche in statistics cur- 
ricula. Indeed, the accrued wisdom is often associated with particular areas of application 
(sociology, demography, efc.) rather than with Statistics as a subject. 


Seasonal adjustment provides yet another story. With its origins as a rather empirical process 
used in statistical agencies, it has attracted increasing attention in recent years with attempts 
to provide it with a sound statistical basis. Bailar refers to some fundamental questions about 
objectives and yardsticks for seasonal adjustment that are now being addressed. Model based 
alternatives to the traditional X11 approaches are also being investigated. This is an area of 
statistical research that has attracted attention among time series experts in universities. Seasonal 
adjustment techniques clearly have applications well beyond government statistical agencies. 


Finally, the most recent example that Bailar describes is disclosure avoidance. This is a 
problem largely confined to agencies operating under a confidentiality code that prohibits 
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divulgence of any identifiable individual information. Most of the research in this area is taking 
place in statistical agencies. The tools being used, however, tend to be from the fields of com- 
puter science, numerical analysis and mathematics. This is a relatively new field that has not 
yet attracted much attention outside government statistical agencies. 

These examples show that methodological contributions from government statistical agencies 
not only solve problems for these agencies but can also lead to significant advances in the field 
of statistics more generally. Of course, not all such contributions have wide applicability and 
some may remain confined essentially to statistical agencies. A continuing challenge for govern- 
ment statisticians is to generate interest among other statisticians, particularly those in univer- 
sities, in research problems arising in government work. 


3. An Environment for Innovation 


Innovative contributions rarely arise by chance. A suitable environment that allows ideas 
to develop and research to flourish is required. This is not always easy within an organization 
whose primary mission is the regular dissemination of data according to pre-determined 
schedules. Bailar refers to three reasons given by Hansen why sampling was accepted relatively 
quickly in the Census Bureau. In essence, these same three reasons define prerequisites for an 
innovative research environment in a statistical agency: 


(a) management support in the sense of a willingness to invest in research activity; 


(b) co-operative clients in the sense that successful research needs a particular application that 
represents the initial problem and sets the research schedule - the manager of this program 
has to be an enthusiastic guinea pig; 


(c) competent research staff, not just in terms of expertise in particular areas, but also in terms 
of the ability to recognize problems susceptible to generalization and solution through 
statistical methodology. 


While these three conditions will help to provide an environment conducive to research, 
further effort may be required to ensure that research results are in fact used, and used 
appropriately. This requires persuasiveness and good communication skills on the part of the 
statistician, as well as adequate institutional support for the new methodology. 


4. Other Contributions 


Bailar was not trying to be exhaustive in her examples of contributions to statistical meth- 
odology. It is worth noting some other areas of statistical methodology in which statistical 
agencies have made significant contributions. Some of these are mentioned as future topics 
by Bailar, but pivotal contributions have already been made. The following areas would find 
a place on a Statistics Canada list. 


(a) Methods for analyzing data from complex surveys Of great relevance to users of most 
government statistics, these methods aim to adapt or replace traditional methods of 
statistical analysis that assume simple random sampling. This is an area of work that has 
attracted the interest of university researchers who have also made many contributions to 
the topic. 


(b) Recordlinkage This technique is used in deriving statistics from administrative records, 
in micro-matches to assess quality, and in list frame maintenance. The development of a 
general theory for record linkage has provided a basis for software to support this activity. 
Most of the work on this topic has emanated from statistical agencies. 
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(c) Editing and imputation Widely used in many surveys, this technique lacked a sound 
statistical basis until theory was developed in the 1970s. Since then methodologies and 
systems have been developed to provide general facilities for performing these functions 
in a variety of surveys. This topic has generated substantial interest and further work out- 
side statistical offices. 


(d) Small areaestimation In recent years the production of estimates for areas smaller than 
could be supported by direct estimation from sample surveys has received increasing atten- 
tion. Statistical offices have developed a variety of methods to address this problem and 
university researchers have participated actively in this work. To date the utilization of 
such methods for production purposes has been limited, partly due to lingering concern 
about the probity of government agencies producing model-based estimates. 


(e) Statistical use of administrative data As another means of reducing data collection costs, 
the statistical potential of existing administrative records has been exploited. Such sources 
present a different array of coverage and data quality problems, from those experienced 
in surveys. While administrative data may be used alone to produce statistical data, they 
may more effectively be used in combination with survey or census data in estimation 
systems that take advantage of the relative strengths of each. Most of this work has taken 
place in government statistical agencies. 


5. Future Areas 


In looking to the future, Bailar foresees increased use of models. This is almost certainly 
correct as statistical agencies strive to extract the maximum information out of existing data 
and minimize the increasing costs of data collection. In particular, she refers to the melding 
of time series methods with survey estimation methods, an area now being explored in several 
statistical agencies. I would add three other domains of activity in which we might look for- 
ward to significant developments in the long run, each of them requiring an interaction of 
statistics with other disciplines. 

The first is the application of expert systems to certain activities in government statistical 
agencies. To use an example already discussed, the choice of the appropriate options or models 
to use in seasonally adjusting a time series could well lend itself to such an approach. The second 
area is the use of cognitive methods for understanding and improving the response process. 
Work in this area is underway at a number of statistical agencies. Drawing on the expertise 
of psychology, it may provide a basis for enabling statisticians to develop better models of the 
response process - probably the least well understood component of the survey process. The 
third area is the development of integrated statistical information systems that combine models 
of social or economic systems with databases on which the impact of different policy assump- 
tions can be simulated. Such systems serve to facilitate the use of an agency’s data for policy 
analysis, and also help it to recognize data gaps in current programs. 

To echo Bailar’s conclusion, the problems are fascinating and there are more than enough 
to go around. 
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Rolling Samples and Censuses 
LESLIE KISH! 


ABSTRACT 


Rolling censuses combine F nonoverlapping periodic samples of 1/F each, so designed that cumulating 
the F periods yields a complete census of the whole population area with F/F = 1. Intermediate cumula- 
tions of k samples would yield samples of k/F for more timely uses (annual or quinquennial censuses). 
Area sampling frames would cover the national territory for naturally mobile populations. These methods 
may often be preferable to other alternative methods for censuses, also discussed. Asymmetrical cumula- 
tions are also recommended to counter the problems of small sample cells for area domains (provinces, 
regions, states) common to most countries and to other population units. Split-panel-designs offer another 
use for cumulating periodic surveys by combining nonoverlapping portions a — b — c — d — with 
panels p for partial overlaps, pa — pb — pc — pd —, for multipurpose designs. 


KEY WORDS: Periodic samples; Time sampling; Cumulations; Split-panel designs; Asymmetrical 
cumulations; Multipurpose designs. 


1. INTRODUCTION AND DESCRIPTIONS 


Several uses and methods for cumulating data from periodic samples are discussed below. 
This has been a rather neglected subject, as the literature on periodic and rotating samples has 
concentrated on the statistics for net changes and for current (‘‘cross section’’) estimates; not on 
cumulations. The first concern here is on rolling censuses and samples, and let me attempt a 
definition of rolling censuses: a combined (joint) design of F separate (nonoverlapping) periodic 
samples, each a probability sample with fraction f = 1/Fof the entire population, so designed 
that the cumulation of the F periods yields a detailed census of the whole population with 
Jf’ = F/F = 1. Intermediate cumulations of k < F periods should yield rolling samples with 
Jf’ = k/Fand with details intermediate between 1 and F periods. We may appreciate that defini- 
tion by looking at examples and counterexamples. We shall also examine possible variations 
that would satisfy the definition and conflicting needs that rolling samples can be aimed to meet. 

Imagine a weekly national sample, each with epsem selection rates of 1/520, and so designed 
that in 520 weeks they are ‘‘rolled over’’ the entire population and the cumulation yields a com- 
plete census of the population averaged over ten years. Each year would yield national and local 
samples with selection rates of 52/520 = 1/10. The design would combine weekly national 
samples into an averaged decennial complete census, and into sample censuses of ten percent 
each year. 

The Health Interview Surveys of the National Center for Health Statistics (1958) cumulate 
52 weekly samples of about 1,000 households each. These samples select about f = 1/80,000 
weekly; thus 520/80,000 represents cumulations of nonoverlapping periodic samples over ten 
years. But they are confined to a set of PSU’s for reasons of cost chiefly, but also for better 
estimates of net change and for current estimates. However, rolling samples may better be 
reserved for samples designed for maximizing (increasing) the spread (representation) of the 
samples cumulated over national (or broad) populations. The words in the parentheses indicate 
that rolling samples constitute a special case of the more general cumulated periodic samples 
and that the boundary of the subset need not be precisely clear. 
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For overlapping between periodic surveys, the requirements for the selection of units of 
cumulated designs are diametrically opposed to the requirements for the objectives and substan- 
tive content of the interviews (the observations, variables). The content of the surveys must 
be as similar, standardized, identical as possible for the cumulations to be meaningful. Using 
periodic panels of the same elements for different contents could broaden the scope of surveys, 
but would not contribute to increasing the sample size for survey statistics. Most periodic surveys 
collect similar variables, though some may also have other contents attached at times. How- 
ever, changes of methods, questions, and variables would cause conflicts and problems. Per- 
haps such changes should be introduced only with extended intervals of “‘splicing’’, using both 
the new and the old methods to study the differences. These problems are fundamentally similar 
to those faced when measuring differences from periodic surveys, but they seem more novel. 
I insist (Section 6) that solutions to such problems must be tailored to specific situations. 

On the other hand, the cumulation of the same elements (persons, households) does not 
increase proportionately the sample size (base), and panels of the same elements would not 
help rolling samples. Many periodic surveys (e.g., labor force surveys of Canada, the USA 
etc.) have partly or largely overlapping fractions of segments (ultimate clusters), and those tend 
to contribute little toward increasing the sample size. Even in surveys with nonoverlapping 
segments (like the HIS of the NCHS (1985)), the segments are confined to the same first stage 
(and second-stage?) units; in these the positive correlations (clustering effects) tend to reduce 
the ‘‘effective’’ sample sizes for overall statistics. Furthermore, those periodic samples, con- 
fined to samples of primary units fail to meet the needs of rolling samples for spreading over 
the entire (national?) population. 

A few more remarks may help to broaden our frame of reference. (1) The discussion often 
assumes area sampling, but the concept can be generalized to other frames. (2) Equal selec- 
tion rates for elements are often used, but cumulations may be modified to unequal selection 
probabilities. (3) The concept may be generalized from regular periodic samples to cumula- 
tions over less regular periods. (4) Cumulations over the entire time span (year or ten years) 
come most readily to mind, but we may envisage systematic sampling of the span; e.g., labor 
force surveys cover only single weeks of the months over the year. 


2. ALTERNATIVE METHODS FOR CENSUSES 


Rolling censuses would be expensive, and the reason for such an innovation should include 
the acknowledged relative weaknesses of the decennial censuses now widely used, and of sample 
surveys and administrative registers, which are proposed at times as possible alternatives. The 
chief reason for censuses is the need for detailed information, especially for small areas; and 
the chief weakness of decennial censuses is their obsolescence between censuses and their great 
total cost that prevents more frequent censuses. Sample surveys have many advantages for 
national statistics and for large regions, but they lack geographical and other details. Good 
registers are rare and they provide few variables beyond a few, bare demographic data. 

Decennial censuses of population, housing, agriculture, industry and others, first and 
foremost, have spread into most countries in the last two centuries, and especially in the last 
two generations with the help of the United Nations State Statistical Office. In addition to 
detailed data for small domains, censuses often may obtain better coverage than samples, due 
to the concentrated publicity and the national ‘‘ceremony’’ connected with censuses; the Chinese 
census of 1982 is a good example (Kish 1979, 1989). The efforts of the census also yield /ower 
unit costs (for short forms) than surveys, but much higher total costs than sample surveys, 
because of much greater size. At 2.6 billions, the 1990 censuses of the USA will cost $10 per 
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capita or $30 per household. That cost of about half to one hour of the median hourly wage 
per capita (once in ten years) seems to hold in international comparisons, though the number 
and complexity of census variables is one of the cost factors. Rolling censuses would probabily 
be proposed and designed for surveys fairly rich in the numbers and complexity of variables. 
In Canada 260 weekly samples of 32,000 households would cumulate to the national popula- 
tion. In the USA 520 weekly samples of 160,000 would be needed by decennial cumulations 
to 80,000,000 households; the CPS surveys have 100,000 with state supplements. 

No detailed comparison of decennial censuses with rolling censuses is possible here, but the 
issue of timeliness must be mentioned, because that is the chief issue in the comparison. Up 
to now the periods for using data from decennial censuses have varied froma start of 1-4 years 
to 14 year or more. Even with faster computers the start is slower for complex social statistics 
than for mere head counts; and the obsolescence over the ten intercensal years becomes worse 
with higher population mobility in our modern civilization. The biases due to obsolescence 
will be monotonic, if not linear, functions of elapsed time. The sizes of the biases will differ 
with variables, populations, efc.; but they will be present and considerable, I believe; often 
perhaps greater even than the famous biases due to under coverage (Kish 1981, 1979). 

Increasing and rapid obsolescence of decennial census data should chiefly motivate the 
searches for alternatives, such as in A Study on the Future of the Census of Population: 
Alterative Approaches (Redfern 1987). ‘‘A serious weakness of the census is that it occurs 
relatively infrequently’. About a “‘rolling census”’ it states: ‘‘The merit of this proposal is 
that ... a much smaller, better trained organization and more experienced staff could be 
deployed both for the fieldwork and for processing ... the public awareness of the rolling 
census would not be highly peaked. Whilst that might well lessen the risk of public protest, 
the reduced publicity would adversely affect the level of coverage achieved ... (The method) 
would complicate the interpretation of the census results, especially comparisons between areas. 
Simultaneous national coverage, one of the virtues of the census, would be lost. The idea of 
a rolling census has not yet been developed and applied’’. 

Most countries will probably still need censuses in 2000 AD. They are being replaced by 
population registers in the Nordic countries and still need to be introduced in some Third World 
countries in 1990. They have been stopped by opposition and by obstacles in a few. But most 
countries need and will have them in 1990. They have been a great and useful invention - like 
the steam locomotive, and at about the same time. However it is possible that the censuses 
also may be phased out gradually by some of the alternatives here considered. 

Quinquennial annual censuses have been proposed, and quinquennial censuses have been 
initiated or carried out in a few countries, including Canada and Turkey. But these are not 
destined for quick acceptance, I suspect. They seem too costly: ten percent samples in two coun- 
tries had half of the costs of complete censuses. Also they still leave a great deal of obsolescence. 
On the other hand, much smaller (e.g., 5 or 1 percent) yearly sample censuses would fail to 
offer enough geographic detail. The one percent ‘‘microcensus”’ of West Germany provides 
yearly sample data. China had a one percent census in 1987; their yearly samples of 1/2,000 (also 
about 500,000 people) collect chiefly fertility data only (State Statistical Bureau 1987; Kish 
1989). Quinquennial censuses are not frequent enough and yearly censuses would be too costly. 

Administrative registers provide a great deal of diverse data in many countries, and they 
are likely to spread in the future. Excellent population registers exist in the Nordic countries 
of Sweden, Norway, Denmark, and Finland, and perhaps in some other countries of Northern 
Europe. Their completeness is based on cooperation, motivation (with social incentives), and 
literacy; in a few cases they are replacing censuses with data from the population registers. In 
other situations their coverage, quality, and updating are far from adequate. We can expect 
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future improvements in the quality, spread, and use of population registers but not quickly and 
not widely. We should not expect them to replace censuses even in developed countries like the 
USA and Canada, and their use in less developed countries soon is even less likely (Redfern 1989). 

Furthermore, even after population registers become adequate in quality and coverage, they 
will contain and supply only a few, bare demographic variables: head counts, age, sex and little 
more. Thus, they will fail to meet the demands of modern society for richer sources of statistics. 
For these the registers will serve only as auxiliary variables. 

Synthetic, ratio regression, and raking estimators are being used increasingly for small area 
statistics (Platek et al. 1987; Purcell and Kish 1980). Census data are usually obsolete, data 
from registers inadequate, and sample data lack details for small areas. The weaknesses and 
strengths of the three methods are complementary, hence combining the advantages of the three 
methods seems like good strategy. This is the common purpose of the several methods of small 
area estimation: to provide estimates for small areas and for other small domains that are cur- 
rent, accurate, and relevant. 

These methods are now being used for local area estimates of population counts for the 
intercensal years, in order to compensate for the obsolescence of the decennial censuses,thus 
sometimes called postcensal estimates. They also have other uses in increasing numbers, é.g., 
they have been proposed to compensate for undercount biases. However,those methods have 
all combined censuses with sample surveys and registers. Therefore,they should not yet be 
considered as alternatives to censuses. Nevertheless, we may raise the question whether rolling 
censuses would perform better or worse overall than decennial censuses in those combinations. 
The answer is uncertain, but I believe that the balance of variance components would favor 
rolling censuses in most cases. However, theoretical as well as empirical investigations will be 
needed to decide this question as well as several others here. 

Partially overlapping samples from multipurpose designs must be considered because they 
exist in many countries for several purposes and they absorb some of the funds available for 
national statistics. These multipurpose surveys often provide labor force statistics and other 
valuable data. They vary in parameters between countries but they also have several basic 
features in common with those of the USA and Canada. They are periodic samples with overlaps 
that are constant and for fixed periods (but all three parameters differ between countries). They 
use area segments for bases, but not panels of households (movers are not followed). The over- 
laps are usually large and these are generally justified with references to reductions of variances 
from positive correlations in the overlaps. But an even greater advantage of overlaps may be the 
lower costs of interviewing in later calls, especially where telephone calls follow first calls on 
foot. These ‘“‘rotation designs’? have dominated practice and literature and they represent an 
important innovation (by H.D. Patterson 1950 and R.J. Jessen 1942). They are designed for 
measuring net changes and current (level) statistics, but not for cumulations. However, the 
variances (per household) would not be greatly increased for overlaps of even a small fraction 
(< 0.3), when compared to the large overlap (> 0.7) commonly used. This is particularly true 
for many variables like being unemployed, which have low correlations between periods. Fur- 
thermore the overlaps could be changed in other ways (Section 5). Therefore it is possible that 
these surveys could be combined with the cumulations needed for rolling samples and censuses. 


3. CUMULATIONS OVER TIME AND SPACE 
Changes in populations and in their variables are often recognized as of three kinds: 


‘“secular’’ trends, which are more or less smooth and monotonic, like ‘‘growth’’; periodic and 
‘‘cyclical’’, such as seasonal fluctuations; and irregular variations which are difficult to describe 
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and often treated as “‘random’’. Designs for cumulating, averaging, and sampling over temporal 
variations face psychological obstacles that differ from our acceptance of designs for varia- 
tions over spatial variations. Spatial variations can be large and sometimes accountable, but 
more often irregular. However, we have learned to accept samples, averages, and cumulations 
over them in population (national) aggregates and averages. 

The psychological blocks still facing rolling samples and censuses may be countered with 
both theoretical and pragmatic arguments. The theoretical and philosophical arguments are 
hinted at above and in later discussions of alternatives (Kish 1987, 6.1B). The pragmatic and 
empirical arguments may be buttressed with several types of uses we recognize as common and 
successful. The same periodic samples for obtaining current data and for measuring changes 
can also be used for aggregates needed for spatial and domain details. Furthermore, by 
averaging (over a year or longer) the temporal variations (seasonal or cyclical or erratic) are 
smoothed over in the moving averages. 

Retrospective data. ‘‘Children ever born’’ to women who completed fertility over the entire 
fertile span of 30 years may represent an extreme for retrospective spans; but other individual 
interview data aggregated over life spans include serious diseases, education, etc. Interviews ag- 
gregated over yearly spans include farm production, work history, income, home and auto pur- 
chases. Of course, all these data have imperfections, which differ across variables, respondents, 
methods, efc. But even cumulations over a week or over a day (such as purchases of bread or 
cigarettes) have errors. Multiround surveys are used for cumulating short term data; for example, 
births during the past month have been cumulated from 12 monthly samples over the year. 

Cumulating rare elements from periodic surveys has often been used to deal with these dif- 
ficult and expensive problems. The topic has been dealt with and illustrated in publications 
on rare items (Kish 1965 11.4; Kalton and Anderson 1986). Statistics for small domains may 
also benefit from cumulations, and single years of birth may exemplify such small domains, 
which consist of ‘‘crossclasses’’. But geographical and administrative units are ‘‘proper 
domains’’; for these the periodic samples are not adequate, because those domains need the 
designs of rolling samples or censuses. 

Cumulations from periodic samples. The Health Interview Survey (NCHS 1958), described 
above, may be the best known example with yearly cumulations of weekly samples of about 
1,000 households from nonoverlapping area segments. It is designed for multipurpose objec- 
tives (like most periodic surveys) including cumulations for some rare diseases, but also estimates 
of current levels and net changes. It provides some estimates for larger domains, as well as 
national estimates for the common diseases. To convert it into a rolling sample, by increasing 
the spread of the yearly samples, would increase field costs, especially in that portion (about 
30 percent only) where the PSU’s are counties (not self-representing). 

A traffic survey provides an interesting example of cumulations, because the population 
is very mobile within the sampling frame of sampling units of locations x hours (Kish, Lovejoy 
and Rackow 1961). The general concept is applicable to nomads and other mobile populations. 
It may also serve less mobile general populations over a longer period, such as the decennial 
spread. 

The earliest cumulation I found is for a sample of California in 1952 (Mooney 1956). ‘‘The 
samples were selected in such a manner that they resulted in a uniform overall sampling rate 
of 1 in 385. For purposes of enumeration, the sample was divided into 52 equal subsamples, 
and a different subsample was enumerated during each week of the survey year. Consequently, 
each week’s enumeration was based ona sample of 1 in 20,020’’. For smaller states (populations) 
and/or larger samples one may imagine weekly samples of 1/520, and complete rolling samples 
in the 520 weeks of the decennial census period. It is likely that such rolling samples have been 
designed for smaller populations. 
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The above examples refer to nonoverlapping periodic samples. Cumulations from partially 
overlapping samples have been used, but with the ‘‘effective sample sizes’’ reduced by the 
amount of the overlap (Ericksen 1974). Furthermore, this paper concerns cumulations of indi- 
vidual cases, but periodic or repeated surveys may also be used for combining statistics from 
them (Kish 1987, 6.6) as in ‘‘meta-analysis’’. 


4. ASYMMETRICAL CUMULATIONS 


This term denotes a proposed method of cumulation for problems that arise because*‘nat- 
ural’’ subpopulations generally vary greatly in size. For example, I have been faced within the 
past few years with ranges of 50 or even 100 to 1 among the provinces (or states) of Canada, 
USA, Australia and China; and those ranges of relative sizes are similar for the provinces of 
most countries. Those inequalities arise because administrative units tend to be created roughly 
equal in areas, but spread over lands with highly unequal population densities. They also exist 
for districts, counties, etc. within most provinces. They also arise for other social units and 
social organizations, like firms, hospitals, universities. But not for all: military units, census 
enumeration districts and elementary schools are created roughly equal. 

For many other frequency distributions rough equalities of classes are created with tradi- 
tionally accepted cumulations over roughly logarithmic scales; e.g., income, city size, efc. are 
often tabulated in classes like 10-25, 25-50, 50-100, 100-250, 250-500, 500-1,000, 1,000-2,500, 
etc. This shows a sensible method of cumulation that creates roughly equal cells on a roughly 
logarithmic scale, and they are traditionally accepted and understood, although highly 
asymmetrical. 

Note also that cells in tables for sample data are generally cumulated over both space and 
time. For example, monthly surveys of labor force often show labor force statistics cumulated 
over the month (or over a week as a ‘‘sample”’ of the month), and also over the provinces (from 
a sample of sampling units). Quarterly and yearly statistics show further cumulations, as do 
the national statistics. The spans of cumulations must balance three parameters of restraints: 
the span of the reference period that may be relatively flexible; the domains of subpopulations, 
which may be more rigid, like provinces; and the sample size expressed in sampling units and 
variance components. Other variables, such as cost factors and ‘‘required precisions’’, tend 
to be expressed through the basic three parameters of cell size. 

Decennial censuses of the population counts represent extremes by emphasizing locational 
detail: persons are placed in homes as of the reference date (April 1 in the USA). But yearly 
and longer cumulations are possible for income, etc. Time gets sacrificed in obsolescence, and 
sample sizes and costs in complete coverage. At the other extreme are monthly sample surveys 
for labor force and health variables, and myriad other variables, where the emphasis is placed 
on timelines and reduced costs, but at great sacrifice of spatial detail. 

Population inequalities between provinces impose severe restraints on timeliness and sample 
sizes. Often higher sampling rate are introduced for the smaller provinces, but such ‘‘optimal”’ 
selection rates bring disadvantages in increased variances both overall and for cross-provincial 
‘‘crossclasses’’ (age, sex, efc.) (Kish 1988, Section 5; Trewin 1987). Thus those mildly unequal 
rates fail to solve conflicts in provincial sizes of 50:1 or 100:1. 

Because of those conflicts the tables for monthly surveys commonly present cells for small 
provinces with inadequately small sample sizes. Two alternative procedures have been advanced 
and practiced for such small cells. A. Release the same data for small cells as for large cells, 
and let the reader (user, consumer) beware, caveat emptor, with perhaps warnings posted 


Survey Methodology, June 1990 69 


to appendixes to sampling errors. B. Don’t release, but suppress small cells, leaving them blank, 
after applying some declared curtailing limits. Readers may be directed to other released publica- 
tions, based on cumulated data (quarterly, annual). 

Asymmetrical cumulation proposes a compromise between symmetrical releases (A) and 
asymmetrical suppression (B). 

C. Asymmetrical cumulation proposes to release for small cells the specified cumulations 
of periodic data. These cumulations may be flexible: for example, quarterly for small cells and 
yearly for very small cells, instead of the monthly data for large cells. The readers may be 
notified (with * or italics or other signs); thus they may choose either C (cumulation) or B 
(disregard). 

AC. This procedure would allow readers to choose either A or B or C by publishing both 
the current monthly data A and the cumulated C data. 

Procedures B and C have the disadvantage that the cells do not sum to the marginals. But 
AC like A do sum to the marginals. Some iterative method could overcome these disadvan- 
tages of B and C. 


5S. MULTIPURPOSE SPLIT PANEL DESIGNS (SPD) 


In order to find adequate funds for rolling samples and censuses it is desirable to consider 
how they could be combined with the periodic surveys now being funded and conducted in 
many countries. These are either monthly or quarterly surveys (sometimes yearly or weekly). 
They are typically partially overlapping samples designed for improved estimates for current 
level and net changes. However they are not designed either for cumulated rolling samples, 
or for panel studies based in the overlaps. I proposed SPD as the design for providing data 
for all those four purposes; and also for some fringe benefits (Kish 1987, 6.5). 

a. Combining two separate periodic samples forms the basis of SPD: to add a panel p to 
a parallel series of nonoverlapping samples a — b — c — detc., with the combination then 
denoted as pa — pb — pc — pdetc. The panel p provides individual (micro) changes and the 
nonoverlaps can be cumulated into larger samples and rolling samples. The combined samples 
provide the partial overlaps best for current estimates and for net changes; thus they can replace 
the usual rotating samples. This combined use is a main feature of SPD,together with the pro- 
vision of a flexible and potentially large sample of nonoverlapping portion for use in cumulating 
samples. 

b. The designs for p and for a — b — ccan be separate and distinct, each ‘‘optimized’’ 
for its own objective. But they must also be combined for joint estimates of net changes and 
current levels; and for that purpose the populations covered and the measurements used must 
be similar enough for the combination. 

c. SPD has considerable advantages because its overlaps exist for all periods, whereas they 
are rigidly fixed in classical rotation designs. This advantage is clear and important for net 
changes because it exists for all desired comparisons. But it also exists for current levels, because 
the correlations may differ among variables. 

d. Including proper panels p of elements necessary for measuring individual (micro or gross) 
changes would be a great advantage for SPD over partial overlaps now used. However, the 
other features can be satisfied with overlaps p’ of area segments as at present. Furthermore 
a modest and slow rotation can be built into the design of either the panel p or the overlap 
p’, so as to retain most of the gains from covariances and from panel information. Perhaps 
some alternation may be introduced to reduce panel fatigue or deterioration. Several surveys 
have used both the overlap p’ and panel p by following as many movers as possible. Most 
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households belong to both samples. The extra cost for the panel depends on the proportion 
of movers and their cost (Kish 1987, 6.2, 6.4). 

e. The advantages and problems of panel interviewing pose difficult problems, with a large 
and varied literature and conflicting results (Kish 1987, Sections 6.4, 6.5). The number and 
spacing of reinterviews that are possible, desirable, and reliable need to be established. 

SPD has an advantage in separating the panel p whose cumulated data may be checked 
against the nonoverlaps for ‘‘panel biases’’, and perhaps even for adjustments of biases when 
those are measured adequately. 

Another useful modification may be to recruit sampling units into the panel by different 
(‘‘optimal’’) selection rates on the basis of their being ‘‘screened”’ in the nonoverlaps. 

f. The size of a — b — c — dneed not always be the same; this flexibility of SPD, which 
differs from the rigidity of rotating designs, may be used for needed sample enlargements or 
for cost retrenchments. Such changes would raise weighting problems (solvable) for 
cumulations. 

g. The relative size of the panel p against the nonoverlapa — b — c — dportions depends 
on feasibilities and costs and needs study (Section 6). For individual changes we need larger 
p, but for cumulations larger a — b — c — d. The larger p portions now common may be 
favored by lower field costs for telephone reinterviews. 

Lower values of p than are now common are good enough for current levels and for net 
changes with weighted estimates; the optima are insensitive and p between 1/4 and 1/2 are all 
nearly best; lower p may also be used where the emphasis lies in nonoverlapsa — b —-c — d 
for cumulations. 


6. CONCLUSIONS AND QUESTIONS 


Cumulated samples provide the bases for four new methods proposed here: rolling samples, 
rolling censuses, asymmetrical cumulations, and split panel designs. Rolling samples have been 
designed, but the other three still await practical applications. Meanwhile we should welcome 
methodological developments that would outline the parameters of feasibility. 

However, the chief tasks for these methods must be found in the details of specific situa- 
tions rather than in theoretical generalities. The factors of costs, variances, biases, feasibilities, 
and public acceptance for novel procedures must be worked out specifically for each situa- 
tion. We can do no more than raise a few questions as examples, in addition to those raised 
implicitly or explicitly in the preceding sections. 

1. For rolling samples and censuses what kinds of moving averages may prove most useful? 
For national aggregates the latest month (or quarter or year) may receive the full weight. But 
for small local areas the data may be cumulated over ten years; with equal or with increasing 
weights? Are ‘‘shrinking’”’ (Stein-James) estimators useful? 

2. How to deal in the aggregates with changes in the population, in methods, in variables? 

3. For asymmetrical cumulation similar questions arise. Should the latest monthly estimates 
(A) be printed together with the cumulated (C)? Methods are needed to make the cells and the 
marginals consistent. 

4. For the split panel design, how large should the overlap (p) be? Can it be a panel or merely 
overlapping segments? Or must we, can we, have both? How does it depend on the correla- 
tions for diverse variables? How do we balance the four chief purposes of periodic surveys? 

There will be other interesting questions but this essay must come to an end before they do. 
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COMMENT 


FRITZ SCHEUREN! 


The statistical literature has neglected the idea of cumulative samples. Leslie Kish, in several 
previous papers and in the present one, has tried to rectify matters. Ever forward-looking and 
practical, he makes a persuasive and compelling case for more work on the design and analysis 
issues raised by cumulation. 

His writing is so down-to-earth that readers may miss the fact that Kish is not just advo- 
cating a few minor additions to the already large supply of survey designs and estimation 
methods. He asks us to look very hard at the topology of the space/time/content trade-offs 
in surveys — especially in censuses. In fact, Kish seems to be advocating what might be called 
a ‘‘paradigm shift’’ in census-taking, at least in developed countries like Canada and the 
US. 

The word ‘‘paradigm”’ deserves some elaboration (Barker 1988). A paradigm is a way of 
thinking and then doing, a pattern of belief and behavior, a way of seeing reality and using 
that sense to accomplish something. Paradigms are common - the way we get to work would 
be a humble example. Conventional census-taking, under this definition, could be characterized 
as a major scientific and technical paradigm. 

As long as our paradigms work well for us, we tend not to change them. Occasionally, how- 
ever, paradigms break down and have to be replaced. The bridge goes out and we need to find 
another route to work. As Kuhn pointed out in his seminal book on the structure of scientific 
revolutions, paradigms break down in science, as well (Kuhn 1970). Perhaps the most famous 
example of this is the revolution in the thinking of astronomers that occurred when the Ptolemic 
earth-centered view of the universe was replaced by the Copernican view of an earth that 
revolved, with the other planets, around the sun. 

Kish, in his paper, argues that major problems exist with the conventional census-taking 
paradigm. He then goes on to consider two possible alternatives: rolling censuses and 
administrative registers. My objective here will be to round out and occasionally balance Kish’s 
presentation of these topics. 


Conventional Census-Taking 


Conventional censuses, like those in Canada and the U.S., continue to do many things very 
well. Indeed, at present, we have no adequate substitute for them; nonetheless, Kish’s point 
of view on the need for at least some change seems compelling. Rising costs are a big factor. 
There have been many improvements in census-taking in this century; still, in both Canada 
and the U.S., total costs and even costs per person have risen significantly: 


¢ The 1990 decennial census in the U.S. is budgeted at about $10 (U.S.) per person. Even 
adjusting for inflation, this is a four-fold increase over what the per capita expenses were 
in 1960. Item content differences between the two censuses are small and essentially not 
a factor in explaining the difference. Both the 1960 and 1990 Census, for example, asked 
only 7 population questions of everyone (U.S. Bureau of the Census 1989). The Census 
long-form sample in 1960 contained 35 questions and was to be completed by 25% of the 
population. For 1990, the Census long-form sample was given to 16% of U.S. households 
and had 33 questions. 


! Fritz Scheuren, Director, Statistics of Income Division, Internal Revenue Service. The opinions expressed here are 
those of the author and do not necessarily represent the position of the Internal Revenue Service. 
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¢ The situation in Canada is similar with regard to the costs of census-taking. For example, 
the 1991 Canadian Census is budgeted at about $9.50 (CAN) per person. Like the U.S. 
Census, there are again just 7, albeit somewhat different, population items that are asked 
of everyone. Like the 1990 U.S. Census, questions on housing are included for everyone 
(2 in Canada and 7 in the U.S.). In Canada, a 20% long-form sample will be employed 
in 1991. The Canadian long-form questionnaire has 45 items for 1991. The 1961 census 
in Canada was quite different from that planned for 1991 and, thus, meaningful cost com- 
parisons are hard to make. Nonetheless, looking back 30 years in Canada, the same long- 
term trend in census-taking costs seems to exist; however, per capita costs have been 
roughly the same - even declining slightly - in the last two or three censuses. 


The U.S. Census Bureau has looked at the growing cost of conventional census-taking and 
concluded that a major change may be needed (Browne 1989). Labor costs have grown 
appreciably in recent decades in both Canada and the U.S. Technological improvements have 
not been great enough to offset these costs, though some, like TIGER (Topographically 
Integrated Geographic Encoding and Referencing) and CATI (Computer-Assisted Telephone 
Interviewing), offer promise. Greater attention in the U.S. to improved population coverage 
is another important factor (Anderson 1990). The degree of public cooperation in the census 
also seems to be dropping, at least as reflected by the poorer than anticipated mail response 
rate for the 1990 U.S. census. (It should be noted that, in Canada, public cooperation has fluc- 
tuated, with no clear tendency.) 

Increasing cost is not the only major problem facing conventional census-taking. Perhaps 
of even greater importance, as Kish notes, is the growing rate of obsolescence of the informa- 
tion collected. The combination of rising costs and growing information obsolescence has had 
the effect of reducing the benefit/cost ratio for conventional censuses steadily and dramatically. 

To obtain more frequent small area data, some countries have introduced quinquennial cen- 
suses. For example, in Canada this was first done nationally in 1956. Budget problems led to 
the 1986 Canadian Census being cancelled and then reinstated. Indeed, it is unclear whether 
there will be a Canadian Census in 1996. While a quinquennial census was also legislated in 
the U.S., funds were never made available. 


Rolling Censuses 


As Kish rightly observes, conventional census-taking, of necessity, must sacrifice both 
timeliness and item content (on a 100% basis) to achieve complete spatial detail and high popula- 
tion coverage. 

One of the alternatives that Kish asks us to look at is a ‘‘rolling census.”’ His proposal envi- 
sions the sampling of a country over a decade in such a way that every area is eventually covered. 
In its purest form, space and time become a single dimension and content remains fixed, such 
that, at decade’s end, we have obtained cumulative information on the entire country fora 
given set of items. 

The chief advantage of a rolling census is that it can avoid the problem of information 
obsolescence at national and major subnational levels. For small geographic areas, though, 
there would, of course, still be only one observation per decade. Unlike a conventional census, 
comparisons among small geographic areas would be very difficult to interpret because the 
data are being collected at different points in time (Fellegi 1981). 

For a rolling census or survey, unit costs could be higher, as Kish notes, than in a more con- 
ventional enumeration (indeed, ceteris paribus, maybe even higher than the cost of existing 
survey efforts). In an age of fixed or declining resources, therefore, it might not be possible 


74 Kish: Rolling Samples and Censuses 


to do acomplete ‘‘enumeration’’ each decade, even if content were significantly scaled back. 
Rolling samples would seem to have their greatest attractiveness not as areplacement for con- 
ventional censuses, but, say, as part of a strategy to link together census-taking with ongoing 
surveys and local area population estimates for the intercensal years (Herriot, Bateman and 
McCarthy 1989). 

Both the United States and Canada employ monthly surveys to estimate the national (and 
some subnational) labor force characteristics. The Canadian Labor Force Survey (LFS) of 
64,500 households covers 0.67% of the total Canadian population each month. ‘‘Given the 
rotation pattern in effect for the LFS, the 0.67% sample per month rolls up into a 6.7% sample 
of unique households over a 5-year period’”’ (Drew 1989). In the Canadian context, at least, 
Kish’s proposal may be feasible. A sample survey vehicle could be designed, with some reduc- 
tion in the month-to-month household overlap, which could achieve many of the benefits he 
has stated for a rolling sample, while also meeting the information needs currently met by 
ongoing household surveys (Drew 1989). This sample would not replace the 100% census count 
data, itself, but, might be a partial substitute for Canada’s 20% long-form census sample. 

Because the United States has a population about 10 times larger than Canada, the tradeoffs 
involving rolling samples and overall country coverage are not as favorable as they are in 
Canada. The U.S. Current Population Survey (CPS), for instance, at about 60,000 households, 
covers only .06% of the total U.S. population monthly. Even if cumulated over a whole decade 
(but, with no change in its rotation pattern), the CPS would cover just roughly 1% of all U.S. 
households. This does not compare well in size to the overall 16% long-form sample being con- 
ducted as part of the 1990 U.S. Census. 

To bring the rolling sample population coverage nearer to the 1990 U.S. decennial sample, 
major changes in the CPS rotation pattern, like those Kish asks us to look at, would be needed. 
Other U.S. Census Bureau surveys might also have to be redesigned if the objective were to 
achieve even a partial substitute. Despite these changes, moreover, the resulting decade-long 
sample would still be only a small percent of the total U.S. population - perhaps, at best, in 
the 2% to 3% range, assuming resources and other requirements remained essentially fixed. 

In both Canada and the U.S., the likely higher unit costs of a rolling sample may need to 
be addressed by changes in survey procedures: how area segments are listed (Royce and Drew, 
1988); how first contact with households is made, efc. Where is it written, for example, that 
a personal interview contact is needed before using other modes of collection? 

It will be no mean challenge to keep effective sample sizes equal for the major level and 
change components now obtained from ongoing surveys (¢.g., Tegels and Cahoon 1982). Some 
compromise may be needed, moreover, in the extent to which the basic content of the current 
long-form Census samples can be included. Despite these challenges, or perhaps because of 
them, Kish’s ideas on rolling samples deserve continued serious attention and should be the 
focus of extensive practical experimentation. 


Administrative Registers 


With the flowering of scientific sample survey methods in the 1940’s (Bailar, 1990), the use 
of administrative records for statistical purposes became relatively less important in many 
national statistics programs. By the early 1980’s, however, at least in the developed countries, 
the pendulum had begun to swing back. Kish recognizes this trend and rightly quotes Philip 
Redfern, who has been the major chronicler of this phenomenum internationally (Redfern 
1987). While the Danes seem to have gone the farthest (Jensen 1983 and 1987), major efforts 
have been made in Canada (e.g., Statistics Canada 1990) and even some in the U.S. (e.g., Alvey 
and Kilss 1990). 
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A good summary of most of the key barriers to the greater use of administrative registers 
for census-taking is found in Redfern (1989), including the extensive discussion published with 
that paper. Perception barriers by the citizens (e.g., in Germany) are mentioned as problems. 
Psychological barriers by the national statistical service may, however, be of equal or even 
greater importance. Major scientific ‘‘paradigm shifts”’ generally have this problem (Kuhn 
1970). Certainly, this seemed to be part of the reason for the reception given to the proposal 
(made by me in 1980) to explore the feasibility of making administrative records an integral 
part of the U.S. Census of Population. While a sketch of sucha proposal was eventually given 
at the 1982 American Statistical Association meetings (Alvey and Scheuren 1982), it seems, 
with a few fairly limited exceptions (e.g., Irwin 1984, Citro and Cohen 1985), that serious 
interest at the Census Bureau has been notably lacking. 

Suffice it to say that in the U.S. very little of the needed research has been undertaken. This 
is true, despite continuing efforts to give the proposal prominence (Jabine and Scheuren 1985 
and 1987) and to get it discussed widely (Butz 1985). Sadly, therefore, I have to agree that Kish 
is probably right that in the United States, at least for the year 2,000, ‘‘. .. we should not expect 
[administrative registers] to replace censuses.”’ 

The 1990 U.S. decennial census could have been used as a proving (or disproving) ground 
for some of the needed research into administrative record alternatives. Why that didn’t happen 
is a matter that can only be speculated about. A contributing factor, quite possibly, is a case 
of ‘‘paradigm paralysis’’ (Barker 1988). The literally decades-long controversy about whether 
to adjust census ‘‘counts’’ seems to have locked the U.S. Bureau of the Census into what some, 
at least, would call an increasingly sterile intellectual position (Fienberg 1990). The viewpoint 
that they have adopted makes it very hard for them to see any alternative, like a (partial) admin- 
istrative record approach, that starts out with the notion that adjustments would be required. 

The situation is different in Canada. Since the late 1970’s, Statistics Canada has assembled 
many of the building blocks needed to conduct an administrative record census (e.g., Drew 
1989; Podoluk 1987; Verma and Raby 1989). While much remains to be done, such a change 
could even happen as early as 1996. For example, the coverage of the Canadian tax return 
system, alone, is quite high and growing. In 1987, for instance, it has been estimated that the 
coverage was about 94% - i.e., about 3% less than the 96.8% coverage achieved in the 1986 
Canadian Census. By 1991, tax return coverage, alone, should be up to about 97% or better, 
with overall administrative record coverage still higher and likely to grow further in the 1990’s. 


Kish expresses concern that administrative registers, even after they become adequate in 
quality and coverage, will ‘‘supply only a few, bare demographic variables: head counts, age, 
sex and little more.’’ An immediate observation concerning his remark is that conventional 
censuses do /ittle more than this, themselves, at least for the 100% items. It is also evident that, 
while the variables on administrative records are not the same as those collected in a traditional 
census, there may a/ready be more available than Kish realizes (e.g., Meyer 1990; Alvey and 
Scheuren 1982). 

More important even than any current item content comparison is the need to emphasize 
that the proposal to use administrative registers in census-taking does not envision that 
administrative records have to be used as they are. Administrative records will need to be 
changed. In my personal opinion, limited optimism about achieving needed changes is justified. 
However, without a doubt, it is too much to expect of administrative records that they will 
be able to capture exactly the same concepts now measured in censuses and surveys. Addi- 
tionally, there almost certainly will need to be special efforts, using existing census-taking tech- 
niques, to separately enumerate certain groups. The efforts in the 1990 U.S. Census to count 
the homeless would be one such example. 


76 Kish: Rolling Samples and Censuses 


Censuses and administrative records each have inherent limitations. Unavoidable concep- 
tual differences will be a major barrier to any shift from one medium to another. Administrative 
feasibility is another issue; however, some hard-to-duplicate census concepts (e.g., households) 
may not be as important to the measurement process as formerly. 

Shifts in methodology (from a conventional census to administrative records) for some uses 
would potentially be accompanied by a parallel shift in the underlying concepts measured. Some 
concepts may alter or expand in meaning, including our ability to measure them (e.g., fami- 
lies). We also must ascertain the extent to which respondents answer survey questions the same 
way they fill out administrative forms that may have real direct impact in their lives. 

In recent years, traditional survey methodology has been enhanced by new tools from the 
field of cognitive psychology. These cognitive research tools could be used to understand any 
conceptual differences between the meaning of terms when they are used in surveys or drawn 
from administrative records. We may not have what we think we have anyway (Bates and 
DeMaio 1989). In any case, there is already an extensive body of cognitive research that can 
be drawn on (e.g., Dippo 1987; Fienberg and Tanur 1989; Jobe and Mingay 1990). 

Kish is close to the mark when he goes on to say that administrative registers ‘‘will fail to 
meet the demands of modern society for richer sources of statistics.’” Such demands, of course, 
appear to be insatiable. Even if they were not, administrative records will never have the flex- 
ibility and responsiveness of surveys. Registers, however, (including partial ones like those that 
exist in the U.S.) when linked to survey data, can be extremely important as auxiliary variables 
in making improved direct national survey - and even subnational survey — estimates. The US. 
Census Bureau’s Survey of Income and Program Participation research on the use of Internal 
Revenue Service data for improving the precision of national survey estimates is a good recent 
example (Huggins and Fay 1988). Indirect (e.g., synthetic) estimates for small areas would still 
be needed for variables not on the administrative registers (Platek, Rao, Sarndal, and Singh 
1987). The registers, though, might provide a source of valuable symptomatic indicators. 


Concluding Observations 


The case Kish makes for considering a ‘‘paradigm shift’’ in census-taking seems compelling, 
at least in developed countries like Canada and the U.S. The rolling census alternative he pro- 
poses is probably too expensive to fully implement as a complete substitute for a census. Rolling 
samples do offer real promise, however, if they can be integrated into the current ongoing survey 
operations of Canadian and U.S. national statistical programs. Such samples could provide 
a needed link in addressing small area estimation needs that might otherwise not be met. Less 
promising, but still possible, is their use as a (partial) substitute for the census long-form samples. 

Kish may be unduly pessimistic about administrative registers. The Canadian situation, how- 
ever, differs from the United States: 


e In Canada, it is already within the realm of feasibility to combine rolling samples with 
administrative records as an alternative to conventional census-taking. This is not to say 
that enormous practical challenges don’t remain. The 100% count portion of the Cana- 
dian census, though, could be done with administrative records as a starting point, aug- 
mented by a large-scale survey to measure and potentially adjust for undercoverage. The 
Canadian 20% census long-form sample might be, at least partially, replaced by a rolling 
sample. The content of the Census long-form is considerably richer than that of household 
surveys, but the content differences could be made up through additional questions “‘piggy- 
backing’’ the on-going surveys at regular intervals. Coverage issues surrounding the use 
of administrative records could also be addressed directly with rolling samples, especially 
to calibrate for changes in administrative records between censuses. 
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e In the United States, the U.S. Census Bureau has begun to look at alternatives other than 
conventional census-taking (Bounpane 1988). Unfortunately, the research needed to look 
at an administrative register alternative has barely begun. Whether the Census Bureau 
will find a better approach than the use of administrative records and rolling samples 
remains to be seen (Browne 1989). Whatever other alternatives they study, however, the 
use of administrative registers as a partial replacement for the conventional 100% counts 
definitely needs to be considered. A preliminary research agenda updating earlier ideas 
is given in Scheuren, Alvey and Kilss 1990. 


Kish is right in saying that, with the radical proposals he (and I) are discussing, the answer 
is uncertain. Like him, I believe that ‘‘the balance of variance components’’ favors a change 
from conventional census-taking in most cases. ‘‘However, theoretical as well as empirical 
investigations will be needed to decide matters.”’ 

In a change as big as the one proposed here, the ‘‘balance’’ that needs to be struck goes, 
of course, well beyond looking at variance (and bias) components. Kish recognizes this in 
numerous ways in his paper. One issue that needs to be emphasized more, though, is that some 
aspects, at least, of the paradigm shifts being considered could go to the heart of the social 
contract that exists between national statistical agencies and the people that those agencies have 
a mission to serve. For instance, in the U.S. Constitution, there is a requirement that an 
‘‘enumeration’’ of the population take place every ten years. Would the use of administrative 
records or rolling censuses fit within this ‘‘Constitutional paradigm?’’ Perhaps the starting 
place is to adopt a broader definition of ‘‘enumeration.”’ 

Another example where social contract issues arise is the extent to which the greater use of 
existing (or expanded) administrative data for statistical purposes might be seen as an 
unwelcome increase in the intrusiveness of the State into the private lives of its citizens (Grace, 
1989). As legitimate as concerns about ‘‘intrusiveness’’ might be, though, there is no evidence 
in a North American context, at least, that they pose an insurmountable barrier. On the con- 
trary, there have been virtually no adverse public reactions to past U.S additions to 
administrative records for statistical purposes (e.g., of residential address information in 1972, 
1974 and 1980 tax returns). To my knowledge the issue, so far, has not come up directly yet 
in Canada, at least at the Federal level. 

In summary, to make changes of the types being discussed by Kish, there is, as he points 
out, the need for a lot more scientific research. Studying the implementation technologies will 
be an even bigger job. Finally, the issues go beyond our profession and may well be settled 
in other arenas. Wherever they are decided, it is incumbent on us, as statisticians, to frame 
the debate in terms of feasible options. Kish has taken us a long way down that path and is 
to be greatly congratulated. 
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Comments on Articles in the Special Section 


MORRIS H. HANSEN! 


These are excellent papers that I enjoyed reading. Three of these papers focus primarily on 
historical and current developments and to some extent looking to the future. The paper by 
Kish is focused on and is an effort to influence some important future developments. I will 
attempt to add a little clarification from my own personal history and point of view on the 
historical summaries, and a little perspective, again from my personal point of view, on Kish’s 
proposal for rolling censuses to replace the more traditional censuses. 

Rao and Bellhouse have given a compact but useful survey of sampling development. Their 
summary begins, after a few preliminaries, at about the time that I first began to participate 
in censuses and sample surveys, and their improvement. 

Their survey is done about as well as can be accomplished in such a compact summary, 
without elaborating on details. However, I would like to provide a slightly different view than 
they present on the development of sampling with probabilities proportionate to size or to 
measures of size (PPS). They accurately indicate that we (Hansen and Hurwitz) developed the 
theory for PPS sampling with replacement as an approximation. We were unsuccessful in 
solving the problem of variance estimation with varying probabilities when sampling without 
replacement that was soon solved by Horvitz-Thompson and others. However, with possibly 
rare exceptions, we never proposed the use of or used sampling with replacement. In practice, 
we did PPS sampling without replacement, usually either by choosing two or more units from 
a stratum by a systematic sampling procedure with the units arranged in arandom or systematic 
sequence, or by choosing one unit per stratum. Units that would have had high probabilities 
of selection were selected with certainty. We prepared estimates of aggregates and functions 
of these by weighting by the reciprocals of the probabilities, exactly as in what has come to 
be referred to as the Horvitz-Thompson estimator. The variance estimators resulted in moderate 
overestimates because they assumed sampling with replacement as a simplification. Ordinarily, 
we have not regarded moderate overestimates of variance as a serious concern. The ultimate 
cluster variance estimator was often used. This is a very simple approximate variance estimator 
that involves weighting (if subsampling has been used) within the first stage units up to the 
first stage unit level, and then computing the variance between such first-stage unit estimates 
(see Hansen, Hurwitz, and Madow, p. 257). Horvitz and Thompson provided the initial 
breakthrough in variance estimation when sampling more than one unit per stratum with 
varying probabilities. 

Sampling with PPS had the advantages that Rao and Bellhouse briefly describe. In addi- 
tion, its use was a great convenience in multistage sampling, with probabilities proportionate 
to measures of size at each stage up to the final. The probabilities at the final stage were often 
set to achieve uniform overall probabilities of selection of the elementary units. 

I add one other comment on their paper with respect to jackknife variance estimation. They 
indicate that the jackknife variance estimators are known to be inconsistent for nonsmooth 
functions like quantiles, even in the case of simple random sampling. They might have said, 
especially in the case of simple random sampling of the elements that are the units of analysis. 
We have recently demonstrated empirically that variances of medians and (in this case) 
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of 10th and 90th percentiles can be well estimated with the usual ultimate cluster jackknife 
variance estimation procedure with multistage sampling in which two or more first-stage units 
or combinations of them are identified in a stratum (one dropped and the other doubled, to 
form a replicate). We hypothesize that jackknife worked well in these applications because 
each ultimate cluster associated with a first-stage unit contains a substantial number of ele- 
mentary units in the sample. We anticipate that it would work equally well, although we have 
not demonstrated it, when the jackknife replicates are formed by another procedure often 
followed, in which a simple random (or stratified random) sample is divided into m simple 
random subsamples (or stratified random subsamples utilizing the same strata to the extent 
feasible), and dropping one subsample at a time. 


Fienberg and Tanur have presented an interesting perspective on the influence of the institu- 
tional setting in which survey research has developed. I agree with their view that an improved 
understanding of the development of survey methods is achieved by an understanding of the 
institutions through which survey research and surveys are done. At least those survey 
developments in which I have participated have arisen largely out of the institutional setting, 
and the need and opportunity to solve problems that occurred in accomplishing programs of 
the institution. Again, I have comments on some of the details in the developments in which 
I was a participant. 


Fienberg and Tanur properly indicate that the design of what is now known as the Current 
Population Survey or CPS (earlier known as the Labour Force Survey) had a key role in the 
evolution of sampling theory and its application that has influenced other developments. How- 
ever, they incorrectly suggest that its principal origins were in the experimental Trial Census 
of Unemployment carried out in late 1933 and early 1934 as a Civil Works Administration 
(CWA) project in three cities. There is some confusion in their paper of the 1933-34 CWA trial 
census with the 1937 ‘‘Enumerative Check Census’’ that accompanied the 1937 ‘“‘Unemploy- 
ment Census’’. It was the latter that, as they mention, Dedrick, Hansen, Stouffer, and Stephan 
jointly worked on, and that was the progenitor of the CPS. The 1937 Unemployment Census 
was a national registration done through the Post Office. The Enumerative Check Census was 
taken by mail carriers in a national probability sample of postal routes - they took a complete 
census of each postal route in the sample. New concepts for measuring labour force and 
unemployment were developed and applied in it based on behavior in a prior week. It was also 
a first application of nationwide area probability sampling. Its purpose was to evaluate the 
1937 national registration of the unemployed (as discussed in the accompanying paper by 
Barbara Bailar). That sample survey taught us much, and was the seed for the monthly Labor 
Force Survey, later to become the Current Population Survey. Again, I was an active partici- 
pant. Bailar desbribes it well. Stock, Frankel, and Webb and others at the Work Projects 
Administration (WPA) also had a role in the design of the national registration and of the 
Enumerative Check Census. Those were the days of dire unemployment, and the need for a 
continuing measure was obvious and urgent. 


With this experience Stock, Frankel, and Webb, along with their colleagues at WPA 
perceived the opportunity and need for a continuing survey. They initiated a monthly unemploy- 
ment and labor force survey, introducing some imaginative concepts in survey design (but also 
some problems that needed later correction). The monthly survey was just getting well esta- 
blished when Pearl Harbor and U.S. entry into World War II occurred, and the needs for infor- 
mation were radically changed. Labor shortage rather than high unemployment became the 
problem. The WPA was no longer needed and was abolished, and the survey was transferred 
to the Bureau of the Census to become a labor force survey to measure especially war-time 
implications of labor force participation and employment. When the survey was transferred 
to the Bureau of the Census we perceived some problems in the original design and developed 
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solutions to them, which led to the introduction, among other things, of PPS sampling and 
other design innovations. These developments for the labor force survey (now the CPS with 
a much broader role) have had a substantial impact on sample methodology, and more impor- 
tant, on meeting the needs of the nation for up-to-date information, not only on labor force 
but on many other subjects - demographic, social, and economic. 


Feinberg and Tanur might also have emphasized the remarkable consequences of bringing 
together census-taking and sampling, along with computerization and automated reading of 
position marks on census questionnaires. In modern censuses in the United States, beginning 
with the 1960 Census, the questionnaires used for collecting information from all households 
are relatively brief in content. The principal content of the censuses is now obtained through 
samples taken simultaneously with and as part of the census, and, of course, on an exceedingly 
large scale in order to produce useful data for perhaps 40,000 small areas. A related develop- 
ment was the introduction in the 1960 Census of self-enumeration methods. The decision to 
introduce self-enumeration was guided by the application of the response error model to which 
Feinberg and Tanur refer, and by associated research and experiments on response errors, and 
especially on the correlated response errors associated with the work of enumerators. These 
innovations were guided by large-scale experiments that were done prior to and as part of the 
1950 Census and in later censuses as well as in separate experiments. Another contribution was 
FOSDIC (Film Optical Sensing Device for Input to Computers), a device for reading position 
marks designed by the Bureau of Standards at the Census Bureau’s request, in response to 
Census Bureau needs to replace the massive key-punching effort in a census. A consequence 
of the innovations that were introduced was more timely results and generally more accurate 
censuses, as well as lower costs. The opportunities for progress arose in view of the problems 
of large-scale census taking, and how they might be solved with the application of sampling 
and self-enumeration, along with the remarkable advances made possible by the development 
and application of electronic computers and FOSDIC, in which the Census Bureau was a 
pioneer. 


In the late 1930’s, some of the top Census Bureau staff , as well as members of Congress, 
were reluctant to see sampling introduced into the work of the Census Bureau. Complete 
enumeration had been the tradition. The use of probability sampling in the 1937 enumerative 
check census associated with the national unemployment registration was an important factor 
in achieving the acceptance of sampling as a methodology appropriate to the Bureau of the 
Census, again as more fully told in the accompanying paper by Bailar. The 1940 population 
census was a pioneering effort in the application of sampling in the collection of supplemental 
items of information in a census. In this effort Deming and I worked as colleagues. I was 
working with Calvert Dedrick, and Deming with Philip Hauser, with effective consultation 
and advice from Fred Stephan, and we all worked as a team in developing this important 
milestone in the application of sampling. 


I have little in the way of comments to add to the paper by Barbara Bailar. As the paper 
indicates, I was an active participant along with Bill Hurwitz and our colleagues, in the 
developments she describes so well. I do have a minor correction. Feinberg and Tanur cor- 
rectly identify the 1951 paper on response error models by Hansen, Hurwitz, Marks, and 
Mauldin as the original publication on the model, which Bailar credits to a later (1960) paper 
by Hansen, Hurwitz, and Bershad. The later paper elaborated those results, and included 
empirical data from the application of the model in large-scale randomization experiments 
involving the random assignment of enumerators in the 1950 Census. Analysis of these results 
as summarized in the 1960 paper showed the substantial and striking impact on small area census 
statistics of correlated errors within the work of interviewers. Earlier memoranda containing 
the results reported in that paper, and associated studies, were the principal vehicles that led 


84 Hansen: Comments 


to the use of self-enumeration as the procedure for collecting the principal content items in 
the 1960 Census. They also led to transferring the collection of much of the information to 
a large sample instead of a complete census, with substantial cost reduction implications, 
improved timing, and generally improved quality. Bailar’s paper provides an excellent sum- 
mary description. 

I should note, in this connection, the remarkable contribution to these developments that 
came from Bill (William N.) Hurwitz. He and I worked as a team that was far more effective 
than the sum of our individual contributions. In addition, I cannot give enough credit to our 
colleagues that we recruited and helped to stimulate and to some extent train, and who became 
the backbone of developments in the Census Bureau in the application of sampling, quality 
control, and operational research methods to the successful design and conduct of samples 
and censuses in wide ranging subject areas. Leaders among these colleagues included Max 
Bershad, Joseph Daly, Leon Gilford, William Madow, Eli Marks, Harold Nisselson, Jack 
Ogus, Leon Pritzker, Joseph Steinberg, Benjamin Tepping, Joe Waksberg, Ralph Woodruff, 
and others. I often get much of the credit, but without Bill Hurwitz, especially, and our col- 
leagues, it could not have occurred. 

I should mention that we benefited greatly, also, from the participation and advice from 
a panel of statistical consultants, with Bill Cochran (William G. Cochran) as chairman, over 
the years from 1955 until I left the Bureau in 1968. Other principal members included Fred 
Stephan (Frederick F. Stephan) and Bill Madow (William G. Madow) for the full time period, 
and Ivan Fellegi from Statistics Canada, H.O. Hartley, and others for part of the time. All 
were exceedingly able. However, we did not look to them as experts whose advice would simply 
be sought and generally followed. Instead, we operated on an interactive basis. We discussed 
specific issues or problems as well as all phases of total survey design for a particular survey, 
experiment or census. We received much useful advice; they also learned from us. 

The paper by Leslie Kish moves the emphasis from historical background and recent and 
current advances to proposals for taking censuses of the future, through the introduction of 
what he calls rolling censuses. He also describes rolling samples in various forms. 

Each of the kinds of rolling samples that he discusses, with and without overlapping panels 
are, as he indicates, in use for various purposes at the present time, and his discussion of these 
does not propose anything new. I suppose he introduces them for generality and as a means 
of suggesting their potential relationship to a rolling census. 

The particular rolling census he describes is a weekly sample, with the total population of 
housing units at each point of time subdivided into 520 subsamples, one to be covered each 
week over a 10-year period. Thus, the entire population of housing units would be covered 
in a decade except for new additions of housing units in samples that had already been covered 
earlier in the decade. If the procedure were continued over time, then at any point in time the 
aggregate of the 520 samples for the prior ten years would provide average census results, 
representing the average situation over the prior 10-year period. It is an interesting and 
imaginative proposal. However, there are also problems. 

He suggests a rolling census without any overlap in the coverage in successive weeks or other 
periods, except after the full decade when it starts all over again. Such an approach would pro- 
vide a large national cross section sample each week, as well as average or aggregate results 
for each month, each year, and for other periods. However, without any overlap in the samples, 
it will be a relatively crude instrument for measuring changes occurring in small areas from 
week to week, from month to month, or even from year to year. Overlapping samples might 
be introduced, as he indicates, but would add greatly to costs. Of course, changes can be 
measured with the proposed rolling samples, but without partially overlapping samples the 
result would be large sampling errors of estimates of change for small areas. Providing data 
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for small areas is a primary purpose of the Decennial Census. I believe that reliably measuring 
such changes may be as important as providing aggregate measures for points in time. While 
Kish recognizes this, he seems to dismiss it. 

Undercoverage of the population would likely be a particularly serious problem with a rolling 
census. Because of the general recognition by the public of the need for censuses, along with 
the intense publicity that is feasible for a census, the completeness of coverage of the censuses 
has traditionally been much greater than that in even the best sample surveys (although cov- 
erage still remains a problem in the censuses). The problem of net undercoverage in sample 
surveys is quite general - even including the Current Population Survey in the U.S. which is 
often taken as a model. Public interest with continuing weekly publicity for a rolling census 
could not conceivably be maintained. 

Another issue in my judgment is the likely high cost of such a system. Kish recognizes this, 
also, and then seems to dismiss it. While I have not seen any cost estimates, I would not be 
surprised that over a decade the rolling census would cost substantially more than the cost of 
taking complete censuses quinquennially, plus the cost of relatively large-scale monthly samples 
to provide measures of change and information on various subjects for states and large areas 
within most states. Moreover, I anticipate that quinquennial censuses would be easier to inter- 
pret and more useful by providing measures for small areas at points in time, or for short 
intervals of time, rather than providing average measures over periods up to ten years. 

The Census Bureau, influenced, in part, by Kish’s earlier recommendations for such a rolling 
census, and the desire to spread the workloads has come up with some proposed alternatives 
for consideration for taking a brief decennial census along with rotating censuses. They con- 
sider some alternative approaches to rotating censuses of whole states over a decade. It is an 
innovative proposal intended to spread the workload while avoiding the high cost of a rolling 
census such as described by Kish. 

I am one who believes that a quinquennial census, along with ongoing large-scale current 
surveys, are well worth a substantial cost. However, I believe that if a rolling census were 
adopted, as proposed by Kish, overlapping samples should be used. A rolling census, even 
without overlapping samples, may cost considerably more than the cost of the current census 
program extended to include a quinquennial census. I question if it is worth the added cost, 
or that it has advantages over a quinquennial census plus substantial intercensal samples. I 
anticipate that the rolling census approach would yield less useful information than quinquen- 
nial censuses for most purposes because it would provide complete census counts only for 
averages over a 10-year period. Quinquennial censuses, along with sufficiently large current 
samples to provide relatively up-to-date information for large areas, along with other procedures 
for providing data for state, county, and perhaps also small area population estimates, seem 
to have advantages from a cost-benefit point of view. 

Kish is to be commended for his efforts to solve some of the census problems by a radical 
new approach. However, to me, the rolling census does not appear to be the answer. Perhaps 
more effective utilization of administrative records can provide results that hold more promise, 
again along with current samples and a decennial, or, hopefully, quinquennial censuses. Per- 
haps the remarkable new computerized mapping and coding system (known as TIGER) devel- 
oped by the Census Bureau for the 1990 Census holds much promise for improving 
census-taking, and for current sample surveys. In addition, incorporating the TIGER 
geographic coding into the major administration records systems might make them more 
accessible for population estimates and for other uses. Up-to-date maintenance of TIGER, 
along with a currently maintained address register, are hopefully to be included in the Census 
Bureau’s future plans. 
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J.N.K. RAO and D.R. BELLHOUSE 


We thank the discussants, Hansen and Smith, for their useful comments. 


Hansen provided important observations on the development of PPS sampling. He is cor- 
rect in saying that Hansen and Hurwitz (1943) did not propose the use of sampling with repla- 
cement and that only for variance estimation they assumed sampling with replacement. 
Incidentally, Murthy (1967, p. 184) notes that Mahalanobis (1938) has referred to PPS sampling 
and the associated unbiased estimator of a total in the context of sampling plots for a crop 
survey. 


Hansen also made some interesting observations on the use of delete-1 cluster jackknife 
variance estimator for nonsmooth functions like quantiles. It is now well-known that the 
delete-1 jackknife variance estimator of a quantile is inconsistent under simple random 
sampling. Empirical results in Kovar, Rao and Wu (1988) indicate that it is also inconsistent 
under stratified simple random sampling. It is also likely inconsistent under stratified cluster 
sampling if the subsamples from the clusters are small or if the intra-cluster correlations are 
significant. In Hansen’s application the subsamples from the clusters are quite large and the 
intra-cluster correlations very small. In this case, the delete-1 cluster jackknife variance esti- 
mator may be well-behaved in view of Shao and Wu’s (1989) result that the delete-d jackknife 
variance estimator, under simple random sampling, is consistent, provided n”/d — 0 and 
n-d — o as the sample size n — ©. 


The method of dividing a simple random sample into m subsamples, each of size d say, and 
dropping one subsample at a time, as suggested by Hansen, is similar to Shao and Wu’s delete-d 
jackknife except that they consider all (7) subsamples in constructing the variance estimator. 
However, the delete-d jackknife variance estimator is likely to be more stable. Shao and Wu 
also consider balanced subsampling requiring only b subsets of size n-d, where b (=n) is the 
number of blocks in a balanced incomplete block design. 


Smith provided some important observations on the foundational aspects of sample survey 
theory, in particular, on the importance of Ericson’s (1969) work on Bayesian estimation of 
a total under exchangeable priors. In this connection, we note that equivalent results for the 
posterior mean and the posterior variance, under simple random sampling, were also obtained 
by Hartley and Rao (1968). A. Scott pointed out the similarity of the two approaches in his 
discussion of Ericson’s paper. However, an advantage of the Hartley-Rao approach is that 
the inferences depend on the sample design, unlike Ericson’s approach. Their approach also 
yields useful classical inferences. Rao and Ghangurde (1972) extended the Hartley-Rao results 
to stratified random samping, double samping with unknown strata sizes, the Hansen-Hurwitz 
method for handling nonresponse, and two-stage random sampling. 


The GUT approach for inference, proposed by Smith looks very promising. We agree with 
Smith that the point estimators using the different approaches rarely differ very much in prac- 
tice, and that the issue essentially reduces to the choice of a measure of uncertainty, as noted 
in Our paper. 

We also agree with Smith on the importance of measuring total survey error from ongoing 
surveys. 
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STEPHEN E. FIENBERG and JUDITH M. TANUR 


We are grateful to Bob Groves and Morris Hansen for their insightful comments and to 
the editor of Survey Methodology for the opportunity to update our thinking in 1990 rather 
than waiting for 2040. Groves and Hansen make several important points; we shall attempt 
to react to them in turn. 


We very much like Groves’ summary to the effect that governments emphasizing service 
for the welfare of the populace demand more information about their services than do those 
pursuing other goals. Consistent with this thesis is the fact that the most substantial new national 
survey launched in the United States during the 1980s, a decade not noted for an emphasis 
by the federal government on expanding welfare services, was the Survey of Income and 
Program Participation, one of whose primary purposes has been to monitor the impact of 
government welfare programs on income and assets. Moreover, as the countries of Eastern 
Europe democratize and turn to the West for assistance in upgrading their statistical systems, 
including the development of infrastructures for the conduct of large scale surveys, we see addi- 
tional support for such a thesis. Thus it seems to us that Groves shares our belief that the 
institutional bases for survey research shape the content and direction of such surveys. Whether 
they provided homes or incubators for the best and the brightest seems to us akin to the nature/ 
nurture debate - more a framework for discussion than an either-or choice. Indeed, we agree 
with Groves that the purposes of the various sectors shaped their choice of tasks, at least in 
part. In line with his urging of a cross national perspective, however, we note that institutional 
roles differ across countries. For example, there has been a widely-held view in the United States 
that the Federal government should not be in the business of collecting survey data on subjec- 
tive phenomena (e.g., see Turner and Martin 1984, 31-39) - a quite different stance has been 
taken by the British government, especially in connection with its annual report, Social Trends 
(Turner and Martin 1984, p.4). 


Groves suggests that the membranes between sectors (academic, commercial, and govern- 
mental) are less permeable than we suggest. Neither we nor he have collected systematic 
empirical evidence on this question, but we point again to our concept of bridging institutions 
which bring together representatives of the various sectors, for the interchange of ideas if not 
personnel. And we hasten to point out that Groves’ own recent appointment to the position 
of Associate Director of the U.S. Bureau of the Census, as well as Hansen’s movement from 
that position into the commercial domain back in 1968, indicate the value, if not the ease, of 
membrane crossing. 


Groves indirectly speculates that we choose to focus on technological advances, longitudinal 
surveys, and cognitive aspects of surveys because these are our areas of interest and experience, 
and he suggests several other developments that are worthy of consideration. Of course he is 
correct in suggesting that we have focussed on the developments that fit with our interests, 
but surely technological advances as a topic subsumes Groves’ first two additional areas of 
importance: (1) development of generalized statistical software packages and (2) existence of 
survey data archives. We wonder, however, if the technological advances we both note, coupled 
with the ubiquity of surveys that we also both note, do not have negative as well as positive 
consequences. For example, the complex analyses of survey data by undergraduates (or indeed 
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any beginners) using statistical software packages often show neither an understanding of the 
data being analyzed nor the appropriateness of the packaged statistical methods used. 


The ubiquity of surveys is a consequence not only of the demand for information but also 
of the relative ease with which surveys can be carried out and the data analyzed given current 
technology. (And we believe that the availability of survey data for reanalysis will only increase 
with the advent and adoption of new storage technologies such as CD-ROM and optical disks). 
Such ease is a mixed blessing. As Groves notes, the 1980s have seen a growing problem of 
nonresponse in the United States, a pattern that manifested itself earlier and (so far) more 
seriously in Europe. We do not need to postulate a growing trend toward demands for privacy 
to explain this decline in response rates, though such a trend may well exist. We need only look 
at the major nonresponse problems currently being encountered in the conduct of the U.S. 
1990 decennial census, in both the mail-out-mail-back and in the door-to-door phases, to see 
evidence to support the contention that respondents are merely getting tired of being surveyed 
so frequently. 


Further, as Groves points out, survey research has not been central to the self-image of 
academe, because survey research has not fully evolved into a separate identifiable discipline, 
with specified standards and training criteria. Since there are no departments of survey research 
on university campuses, almost anyone who cares can mount a survey or carry out analyses 
of survey data. While some people do these tasks well, others do them poorly thereby giving 
the whole survey enterprise a bad name. Thus, if we are to present the optimistic report on 
the state of the survey enterprise in 2040 that Groves envisages, it seems to us that the innova- 
tions in education and training that neither he nor we are currently able to chronicle will have 
to become institutionalized. 


We are especially pleased to have Hansen’s embellishment on our brief account of the 
development of the survey enterprise in the U.S. government in the 1930s and 1940s. His com- 
ments supply some of the human drama that Groves says is lacking in our institutional focus. 


Hansen also expands on our account of the link between censuses and sampling and the 
introduction of self-enumeration into U.S, censustaking, that was guided by the study of 
response errors. The major decline in completion rates for self-enumeration in the 1990 
decennial census suggests the need to reexamine the implications of the various components 
in the Hansen-Hurwitz-Marks-Mauldin model for non-sampling errors. In addition we note 
that as part of the 1990 census, the Bureau of the Census will mount a new Post-Enumeration 
Survey (PES) of 150,000 households whose results will be used to evaluate census coverage. 
The technological advances in computerized data management and in computer-based matching 
of files between the PES and the census were essential ingredients to the launching of this major 
new government survey and its planned use to measure both under-and over-coverage of the 
household-based population. 
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BARBARA BAILAR 


The comments from the discussants describe even more contributions of the Federal Govern- 
ment to the world of statistics. I am very grateful to Gordon Brackstone and Morris Hansen 
for mentioning these additional topics. The topic I omitted that may have had the biggest impact 
on statistics as well as other quantitative fields was the development of the computer for data 
processing and data analysis purposes. Again, the team of Hansen and Hurwitz were the prime 
movers, urging and funding the development of UNIVAC I and then bringing it into the Census. 


Morris Hansen describes the remarkable team at Census who worked with him and Bill 
Hurwitz on so many topics. I feel very fortunate that I began my career at the Census Bureau 
when these people were there and that I was able to work with most of them for many years. 
It is rare that one gets that kind of apprenticeship. 


Gordon Brackstone questions whether the statistical methodology developed by the Census 
Bureau had a benefit to the wider world of statistics. Certainly, given the amount of interaction 
among government statistical offices, the Bureau of the Census has influenced government 
statistical operations in other countries. Brackstone finds the impact of the Census Bureau 
development on university statistics departments rather mixed. He may be correct as far as 
course offerings are concerned, but I believe the ASA-NSF-Census F ellowship program and 
the Agriculture Fellowship program have had a big impact. More university professors and 
graduate students are aware of and working on non-sampling error, disclosure avoidance, and 
time series problems. The recent addition of Fellowship programs at the Bureau of Labour 
Statistics and the National Center for Education Statistics have also highlighted these research 
areas. The NSF now receives many proposals based on research started at one of the government 
agencies. 


The main problem now is to make sure that research results are used. Many government 
programs are slow to accept new methodology because change is disruptive. Yet, to make sure 
that methods are improving, change is necessary. 
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LESLIE KISH 


In his fine discussion Fritz Scheuren complements our comparisons of alternative census 
methods by advocating administrative registers for the USA. I support his expert plea to study 
what these methods could offer as additions, as complements to the decennial censuses. They 
are coming to many countries and we would like to know where, when, and how? It is even 
likely that they will not only complement, but even replace decennial censuses soon in some 
places. When in the USA? I don’t know; we were comparatively slow and late in adopting a 
successful registry of births and deaths. And even now their reporting is rather slow. 


Rolling samples could be designed for quick reporting, and timeliness is only one of the 
advantages of rolling samples. Thus it is biased to compare rolling censuses with traditional 
censuses, both as regards costs and benefits, only on the basis of the single output for which 
decennial censuses are designed. It would take detailed, technical investigations to compare 
the factors of costs, coverage, timeliness, content, etc. of rolling versus decennial censuses in 
the USA. But 10 to 15 million dollars monthly can go far. The issue of adequate censuses is 
most salient in 1990 in the USA and elsewhere, but the other uses of samples should not be 
forgotten, as we plan for the last decade of our twentieth century. 


My contribution aims mainly to advance the diverse advantages of cumulations from periodic 
samples, which have been neglected in favor of the other benefits that can be obtained from 
the growing numbers of periodic surveys. Rolling censuses may become someday one of those 
benefits, and rolling samples have been used already - though not often enough, I believe. 
Asymmetrical cumulations may exist rarely and obscurely, and the split-panel designs that I 
propose, not at all. 


Furthermore my scope is not merely national (the USA), nor even continental (North 
America): it is intercontinental and international. For example, registers have come to the 
Nordic countries and they may come to Canada before the USA. Rolling censuses pose a much 
smaller expansion of the Labour Force survey in Canada because it is one-tenth the size of 
the USA, as Fritz and I both show. But some other country may well use them before either. 


Not only international, rolling samples and cumulations are also aimed to be inter- 
disciplinary, not only for making population counts. Good many of the other needs of statistical 
offices - and of other institutions for data collections! - would be better served by a trained 
‘“permanent’’ staff than by a hurriedly hired huge army whose training time roughly equals 
their brief employment. 


Scheuren is most complimentary when he calls rolling censuses a new paradigm. It is true 
that, as all new paradigms, they meet three big mental blocks when I present cumulations and 
rolling samples: a) averaging of variable data instead of an arbitrary date like April 1, of the 
decennial year; b) accepting some of the mobility of human populations instead of fixing them 
to unique sites; c) rolling samples to replace fixed primary sampling areas. So it may seem 
paradoxical when Morris Hansen notes that my ‘‘discussion of these does not propose anything 
new.’’ Hansen may have encountered all of these proposals, and perhaps dismissed some of 
them. Personally I have described rolling samples since at least 1961 and proposed rolling cen- 
suses since 1965. But I also found that for many people they come as new ideas, and often as 
strange new ideas. 
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Finally let me only add two important origins in the ‘40’s for sampling, although for me 
personalities and priorities are only minor aspects of the history of any science. lowa State 
at Ames should be mentioned, where, under George Snedecor and Henry Wallace, Bill Cochran 
started in the spring of 1939 the first course of sampling and turned out pioneer MA’s, then 
PhD’s in sampling. Then Henry Wallace (again) in the US Dept. of Agriculture started the 
Division of Program Surveys, hired me in 1941 and Steve Stock in 1942 for the first national 
samples in Washington in 1942, followed by the 1943 sample at the USBC. Stock, Frankel and 
Webb (from the WPA samples) began the second sampling course in fall 1939 at the USDA 
graduate School, which became famous and productive under Hansen, Hurwitz and their 
Census staff. Among influential courses there I shall testify especially to those of Deming, the 
major figure at the school. The teaching and learning of samples in the forties was done mostly 
at Ames and in the USDA, as well as at the USBC. 
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Some Developments of Sampling Techniques and 
their Use in Official Statistics in Sweden 


TORE DALENIUS and CARL-ERIK SARNDAL! 


In this paper we present some important features of the history of sample surveys in Sweden, 
and we comment on related developments of sampling techniques (methods and theory) in 
official statistics. The account is organized into three periods as follows: (i) before 1900; 
(ii) 1900-1950; and (iii) after 1950. The emphasis is on the third period. 


I. THE PERIOD BEFORE 1900 


1. Asummary view. As described in Dalenius (1957), there was a noticeable resistance against 
sample surveys in traditional fields of official statistics, especially among statisticians in 
leading positions. Sample surveys were considered justified primarily in cases where cir- 
cumstances did not admit total surveys. In other fields there were, however, signs of 
appreciation, as illustrated in the next section. 


2. Two classic illustrations. In the 1820’s, the area of meadowland in Sweden was estimated 
using the following technique. For each county separately, the ratio of meadow acreage 
to arable land was computed for a sample of farms. This ratio was then applied to the 
total arable land acreage of the county, for which a separate estimate was available. And 
in 1830, the proposal was made by an official in a forestry board to estimate the volume 
of timber in a forest by means of a ‘‘strip survey method’’. 


II. THE PERIOD 1900-1950 


3. The main features. The potential of sample surveys in official statistics was slowly being 
understood. To the extent that sample surveys were used during this period, the design 
typically called for systematic sampling, whenever this was operationally feasible. In many 
applications, the sampling fraction was 1/10 or 1/5. In the 1940’s, a major factor favouring 
total surveys was the war-time economy with its regulations and rationing. This influence, 
which lasted roughly until the end of that decade, was however counteracted by the 
introduction of Gallup polls into Sweden and especially by the spectacular accuracy of 
the Gallup Institute’s forecast of the 1944 election. In particular, these trends were followed 
with interest by official statisticians. 


4. The 1911 Forest Survey in Varmland. The essential feature of the design was that the 
volume of timber was measured on sample plots along 10 meter wide strips covering the 
area of Varmland. It is worth noting that the ‘‘representative characteristics’ of the survey 
were analysed by means of probability theory. 
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. The 1911 Housing Survey in Goteborg. This survey was carried out by the municipal 


statistical office in G6teborg. The selection of the sample of apartments was based on an 
urn scheme. Each building in Géteborg was represented by a slip with identification data. 
The slips were thoroughly mixed in an urn and a 20% sample of slips was selected. The 
motive behind the scheme was to avoid that the survey be criticized for using a biased 
sample. The urn scheme was described by the person in charge of the survey as the only 
method ‘‘which can be called representative’’. 


. The 1935-36 Partial Population Census. This sample census used an elaborate scheme of 


controlled selection. The results from this census played a decisive role in an intense debate 
in Sweden concerning a ‘‘population crisis’’ which was feared as a result of low birth rates 
at the time. 


Il. THE PERIOD AFTER 1950 


. The beginnings of a new era. The greatly improved international communications after 


the end of World War II contributed to making the statistical community in Sweden aware 
of the recent advancements in sample survey theory, methods, and applications in the 
United States and India, to mention two of the leading countries. The new developments 
were studied and discussed, for example, at the conference of the Scandinavian statisti- 
cians in Helsinki in 1949. Statisticians were proud to be able to ‘‘talk sample survey 
methods’’; to be sure, in some cases this ability was limited to knowledge of certain tech- 
nical terms, notably ‘‘stratification’’. Mention should also be made of the influence exer- 
cised by the United Nations and affiliated agencies such as the Food and Agriculture 
Organization. In the following we give some examples of sample surveys and related 
developments of methods and theory. For cases dating to the early 1950’s, details are found 
in Dalenius (1957). 


. The 1950 sample inventory of acreages and livestock. In the 1930’s, sample surveys were 


used to estimate acreages of various crops and animal stocks. These surveys were referred 
to as ‘‘representative counts’’. They were based on nonprobability selection of farms. The 
aim, which however was not achieved, was to select 1/10 of the farms in each of several 
size-groups into which the farms had been divided. In the 1940’s, these surveys were carried 
out on a total basis. A decision was made for the 1950 survey to return to sampling. The 
design that was suggested and largely implemented for the 1950 survey represented a partial 
break with the classical tradition of selecting every tenth unit. While the total sample size 
was fixed by the government authorities to be 1/10 of the total number of farms in the 
target population, the new design called for stratifying the farms by size groups based on 
acreage and using minimum variance allocation, which implied a selection of relatively 
speaking more large farms than small farms. It is interesting to note that the government 
authorities responsible for assessing the design felt it necessary to consult the U.N. Sub- 
commission on Statistical Sampling about the appropriateness of the drastic deviation from 
the ‘‘every tenth unit rule’’. The Subcommission wholeheartedly endorsed the design. 
Consequently it was accepted in principle. The design provided considerable opportunity 
for research. In fact, three contributions to the theory of stratified sampling emerged, 
namely, (i) how best to divide a population into L strata; (ii) the best choice of the number 
of strata; and (ili) sample allocation to the strata for estimation of several parameters. The 
suggested design also called for addressing the problem of ‘‘measurement errors’’ in the 
acreage, and a special calibration survey was proposed. However, the authorities rejected 
this proposal. 
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Yield estimation. During World War II, the yield of various crops was estimated using 
data collected by ‘‘eye estimates”’ of the yield per unit area. By 1950, it was realized that 
this data collection method could be seriously biased. In the beginning of the 1950’s, time 
was ripe for considering a different approach, namely, crop estimation based on harvesting 
sample plots, referred to as ‘‘objective crop estimation’’. Accordingly, a pilot study was 
carried out to test the use of this approach. The outcome of the test was convincing. From 
then on, the ‘‘objective’’ method has been used. As part of the pilot survey design, a scheme 
was developed for without replacement selection of n = 2 farms from a stratum with pro- 
babilities proportional to size, as discussed in Dalenius (1953). The scheme called for 
dividing each stratum at random into two parts, and selecting one farm from each part. 


Developments relating to nonsampling errors. In the early 1950’s, the problem of non- 
response received considerable attention in Sweden as in other countries. Surveys with 
20-30% nonresponse were not unusual. This generated a vivid and sometimes heated debate 
in the statistical community about the distortion of the estimates. For a while, the sta- 
tisticians seemed to have the problem under their control. The public concern about inva- 
sion of privacy has lately changed this picture; nonresponse has again become a serious 
problem. In the last 15 years, several contributions were made in the area of control of 
nonsampling errors. The problem of ‘‘evasive answer bias’’, to use the term introduced 
by S. Warner in connection with randomized response, was addressed in Swensson (1976). 
And Lyberg (1981) successully tackled the problem of controlling the coding operation 
in a population census or in a survey with interviews. 


Respondent burden. In recent years there has been a growing concern about respondent 
burden and its negative effects on response rates. For example, the target population in 
many business surveys is the same, rather limited population. The problem can be alleviated 
by special sample selection techniques. The SAMU system for business surveys at Statistics 
Sweden permits ‘‘negative coordination’’ of samples, in the sense that samples without 
overlap can be selected with the technique known as JALES. To each unit in the sampling 
frame, a uniformly distributed random number is attached. This number stays with the 
unit, and is used in the selection of samples over time. 


Modeling in combination with traditional probability sampling principles. Since the 1950’s, 
the methodology for surveys had closely followed the strong probability sampling tradition 
established by Neyman and by Hansen and his co-workers in the United States. However, 
sometimes modeling is necessary in surveys when the traditional probability sampling 
theory is not sufficient. Since the 1970’s the use of modeling in surveys has been explored. 
The book Foundations of Inference in Survey Sampling by Cassel, Sarndal and Wretman 
(1977) exposed the new trends. Also, a number of papers by these and other Swedish 
authors showed how models may assist in inference from surveys. In recent years, method- 
ologists at Statistics Sweden have shown unusual openness to incorporating modeling in 
the making of survey estimates. An early example where design-based and model-based 
ideas were combined is the ‘‘Oresund survey’’ for measuring traffic flow between Sweden 
and Denmark. The design is discussed in Cassel (1978). Some surveys are now designed 
with the aid of modeling assumptions, as in the work force survey described in Lundstr6m 
(1987) and in an ongoing project of restructuring of the business survey sector. 


Safeguarding privacy in surveys. In the last two decades, the general public has become 
increasingly concerned about invasion of privacy in connection with surveys, including 
population censuses, carried out by Statistics Sweden. As a result, there has been a trend 
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towards increasing nonresponse rates in some surveys. Several measures have been taken 
to deal with the problem: (i) Statistics Sweden has adopted the Ethical Declaration of the 
International Statistical Institute (1986); a translation of that declaration was distributed 
to all employees; (ii) In 1987, Statistics Sweden held an international conference which 
focused on policy issues (as distinguished from ‘“‘techniques’’); the discussions at the con- 
ference are summarized in Statistics Sweden (1987); (iii) Statistics Sweden has promoted 
the development of new safeguards for privacy in its surveys and has taken active steps 
to apply them. A review is given in Dalenius (1988). Of special interest are papers by Block 
and Olsson (1976), who describe a measure for the identifying power of quasi-identifiers, 
and Cassel (1976), who discussed probability-based disclosure. 


14. Specific events. The increasing appreciation of sample surveys since around 1950 led to 
the creation of the Survey Research Center at Statistics Sweden in 1953. A similar inter- 
pretation may be given to the establishment of a professorship in ‘‘statistics, especially 
official statistics’’ at the University of Stockholm in 1965. Also, professorships in survey 
methodology were recently created at Statistics Sweden. 
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Variance Estimation when a First Phase Area 
Sample is Restratified 


PHILLIP S. KOTT! 


ABSTRACT 


This paper proposes an unbiased variance estimation formula for a two-phase sampling design used in 
many agricultural surveys. In this design, geographically defined primary sampling units (PSUs) are first 
selected via stratified simple random sampling; then secondary sampling units within sampled PSUs are 
restratified based on their characteristics and subsampled in a second phase of stratified simple random 
sampling. 


KEY WORDS: Two-phase sample; Primary sampling unit; Secondary sampling unit; Unbiased. 


1. INTRODUCTION 


Suppose we have a sample of geographically defined primary sampling units (PSUs) drawn 
from a stratified area frame. Each sampled PSU contains a number of secondary sampling 
units (SSUs) which are restratified based on their characteristics. Subsamples of the SSUs are 
then drawn within each new stratum. To avoid confusion, only the original area strata will 
hereafter be referred to as strata; the new strata based on SSU characteristics will be referred 
to as domains. Stratified simple random sampling (srs) without replacement is performed at 
both phases of the sampling design. 

This article derives an unbiased variance formula for the estimation strategy described above 
which is used in many agricultural surveys (for example, see Kott and Johnston 1988) but is 
not restricted to such surveys. The formula is a generalization of a suggestion by Cochran and 
Huddleston (1969, 1970), who assumed unstratified srs in the first sampling phase. It is also 
a special case of a variance formula in Sarndal and Swensson (1987). The Sarndal and Swensson 
formula (their equation (4.4)) depends on the calculation of a joint inclusion probability for 
each pair of subsampled SSUs. This proves cumbersome for the particular application under 
study because there are six distinct situations which need to be considered (depending on whether 
or not the two SSUs come from the same PSU, stratum, and/or domain). The derivation 
presented here follows a different line of reasoning entirely. 


2. PRELIMINARIES 


Suppose we start with an area survey consisting of n, (out of N,) PSUs from each of H 
strata. The SSUs within sampled PSUs are then restratified into D domains. Within domain 
d, mz (out of Mz) SSUs are subsampled. Both phases of the sampling design are stratified srs 
without replacement. 

Let us concentrate on estimating the total for a particular item of interest. To this end, 
let 
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gs! = denote the set of all SSUs within a PSU selected in the first phase of sampling whether 
these SSUs are in the subsample or not, 


Sp; = denote the set of subsampled SSUs in PSU j of stratum /, 
S,. = denote the set of all subsampled SSUs in stratum h, 


R, = denote the set of all subsampled SSUs in domain d, 


x; = denote the value of interest for SSU ij, 
e; = (Nj/mp,)(Mg/mg)x; (assuming 1€Sp. R,) be the ‘‘fully expanded”’ value of interest 
forSSU,/s 
Canj = »} ej, 
i€Sp j NRg 
Cdh. = »3 ei» 
ieSp. Rg 
Ena= ‘3 e;, 
i¢Rq 
C hives 9, Gi, and 
1€Sp j 
Cho = ay Cire 
ieSp. 


Note that when S,,; is empty, €g,; and e.,;. are Zero. Likewise when S,,. is empty, @gy. and e.p. 
are zero, and when R, is empty @gn;, an.» and eg.. are zero. 


An unbiased estimator for X, the sum of x; values across all SSUs in the population, is 


D 
—— »y i e;. (1) 


To see this, observe that ¥ = Vic s! (Np/np)x; is an unbiased estimator of X with respect to 
the first phase of sampling, while X is an unbiased estimator of X with respect to the second 
sampling phase. Mathematically, EF, (X)2=eX ‘andtE 5(X) = X, which implies FE (Xyi= 
EES) "* 


3. VARIANCE OF X 


From any of a number of textbooks on sampling theory (e.g., Cochran 1977, p. 276), we 
know that the variance of a two-phase estimator like X is 


var(X) = var,;[E2(X)] + E,[var,(X)], (2) 


where £;, and var, denote, respectively, expectation and variance with respect to the k'® phase 
of sampling. 
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The first term in equation (2) is often called the first phase variance because it equals the 
variance that would be obtained if every SSU within a sampled PSU were part of the subsample. 
The second term in (2) is often called the second phase variance. It is easier to estimate than 
the first phase variance and we will attack it first. The problem with first phase variance estima- 
tion is that total value of interest for a PSU in the first phase sample can only be estimated 
using the subsample. As is well known, putting an estimated PSU total in place of a real total 
in the usual one-phase variance formula biases the resulting estimator. 


3.1 Second Phase Variance Estimation 


An unbiased estimator of var,(X) given any original sample is automatically an unbiased 
estimator of E,[var,(X)]. To see this, suppose that v, is an unbiased estimator of var, (X) 
given any sample. Since E,[v. — var,(X)] = 0 for every possible S', the first phase 
expectation of E,[v, — var,(X)] must also be zero. Consequently, E (v2) = E,E,(v2) = 
E,[var,(X)]. 

Now given our particular S!, 


D 
var, = ye (1 — mg/Mq)(mq/(mg — 1)) i S e] = ca.2/m4| (3) 


d=1 i¢Rq 


is the conventional unbiased estimator for var,(X). Moreover, equation (3) would hold what- 
ever first phase sample obtained. As a result, var, is also an unbiased estimator for 
E, [var,(X)]. 


3.2 First Phase Variance Estimation 


Consider a PSU / within stratum h. The value e.,,; is an unbiased estimator of (N;,/np,) 
times the total value among all SSUs in PSU / whether in the current subsample or not. 
Consequently, E>(e.,;) is exactly equal to (N;,/n,) times the total value among all SSUs in 
PSU /. With this in mind, the following would be an unbiased estimator of the first phase 
variance of X: 


var, [E,(X)] = 


H Np 
3 (1 — my/Np) (Mn / (My — pi we {En (e.4;)}* — (Ex(e5)}?/m4| (4) 
h=1 


Jat 


Taken as is, equation (4) is of little use since it supposes we know what the { £>(e.;,;) } * and 
{E,(e.,.)}7 are. Nevertheless, it does suggest that var, [E,(X)] would be estimated in an 
unbiased manner if one could find unbiased estimators for the {£2 (e.,;) } 2 and {E>(e.,.)}7 
to plug into (4). 

Observe first that e.,; 
{E,(e.,.)}*. In fact, 


2 2 


and e.,.“ are not unbiased estimators of {Ey (e.4;)}? and 
E(@.nj’?) = {E2(e.nj;)}? + vare(e.n;), 

while (5) 
FE, (é.5:7) {E3(e:,.).}* + var, (e.,.). 
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These equations hint towards alternative estimators for {F)(e., V9? and fEx(e.42}?. If 
V2,j; and V2,, Say, were unbiased estimators of var2(e.,;) and var2(e.,.), respectively, then 
e.h fo — ¥2,; would be an unbiased estimator of {E2(e.,;)} 2 while e.,.7 — V2, , would be an 
unbiased estimator of {E(e.;.)}?. 


From Cochran (1977, p. 143, eq. (SA.68)), one can see that 


D 
var, j; = yy (1 — mg/Mg) [mg/ (mq — 1)) i yp e?| _ caja 


d=1 i€Sp jp Rg 
and (6) 
H 
var, - 2 (1 — mq/Mz) [Mg/ (mg = 1)] i ys e?| = ca*/ma| 
i iéSp.N Rg 


are, respectively, unbiased estimators of var,(e.,;) and var2(é.,.). 


3.3 Putting It All Together 


Observe that combining equations (3) and (6) can yield (after some manipulation) this 
estimator for the second phase variance of X: 


H Np 
var, = > [n,/(n, — 1)] > Varo, ; — Var2,/(n, — 1) + 
h=1 j=l 


D 
yb fa — mg/Mq)(1/(mg — 1)] - 


=1 


H nh 
( ye [n,/ (Np, == 1);] if SS can] a can? == cu.) é (7) 
h=1 j=1 


By plugging Save — vary,; and e.,.2 — var, respectively into {E,(e.,;) 7 and {E,(e.,.)}* 
of equation (4), we have the following estimator for the first phase variance of X: 


2 


H 
var, [E>(X)] = Dy (1 — ny/Nn)(n_/ (mp, — 1)) - 
h=1 


Nh 
i ¥ C.nj = vara} =o Pefe = vata) /| ° 
j=1 


This can then be added to (7) to yield the following estimator for the variance of X in (1): 


var =~A+B+C, (8) 
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where 


H nh 
A= 3 [ny (nL) If > ei] = et , 


h=1 j=1 


D 
B= {a = mg/Mg) (1/ (mg — 1)1 + 
d=1 


H Nh 
( bay i Coa lie if DS ea] = can? = ea.?) : 
h=1 cep 


A Np 
=a — YS Satn/ (Mn = | me fe.) = var, ;} — feu — vara} /ms| - 
h=1 j=l 


Sn = n/N, is the first phase sampling fraction in stratum A, and var», j and var», are defined 
by equation (6). 


Observe that if all the first phase sampling fractions are very small, then the contribution 
of C to (8) can be ignored. In any event dropping C would at worst give var an upward bias, 
since E(C) < 0. 

Observe further that var would collapse to A if - in addition to C being ignorably small - 
the sampling design had been conventional two-stage sampling; that is, if each domain had 
been contained within one of the originally sampled PSU’s so that yy.. = yap j = Yan. and 
B = 0. This should not be surprising, since A is the standard variance estimator in two stage 
sampling when the first stage is srs with replacement (Cochran 1977, p. 307). Ignorable first 
stage sampling fractions blur the distinction between srs with and without replacement. 

The right hand side of (8) can, in principle, be negative. This is because B is often negative 
(since yg.. = Yan. = Yanj), while A can theoretically be as small as zero. Kott and Johnston 
(1988) applied a formula similar to (6) to data from a US Department of Agriculture survey. 
In the 41 cases they examined the absolute value of B was always less than 7% of A. 

One final note. Since B < OandE(C) < 0, using A alone provides a conservative, unam- 
biguously nonnegative, estimate for var (X). 
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Estimation Using Double Sampling and 
Dual Stratification 


DONALD B. WHITE! 


ABSTRACT 


The problem considered is that of estimation of the total of a finite population which is stratified at 
two levels: a deeper level which has low intrastratum variability but is not known until the first phase 
of sampling, and a known pre-stratification which is relatively effective, unit by unit, in predicting the 
deeper post-stratification. As an important example, the post-stratification may define two groups cor- 
responding to responders and non-responders in the situation of two-phase sampling for non-response. 
The estimators of Vardeman and Meeden (1984) are employed in a variety of situations where different 
types of prior information are assumed. Ina general case, the standard error relative to that of the usual 
methods is studied via simulation. In the situation where no prior information is available and where 
proportional sampling is employed, the estimator is unbiased and its variance is approximated. Here, 
the variance is always lower than that of the usual double sampling for stratification. Also, without prior 
information, but with non-proportional sampling, using a slight modification of the second phase 
sampling plan, an unbiased estimator is found along with its variance, an unbiased estimator of its 
variance, and an optimal allocation scheme for the two phases of sampling. Finally, applications of these 
methods are discussed. 


KEY WORDS: Two-phase sampling; Prior information; Variance estimation; Optimal allocation; 
Non-response. 


1. INTRODUCTION 


Various stratified sampling designs employ various types of prior information. For example, 
the usual stratification model assumes full prior knowledge of individual stratum memberships. 
Post-stratification is useful when there is global information on stratum sizes but no informa- 
tion on individuals. Double sampling for stratification, on the other hand, assumes no prior 
information on strata. Further, some knowledge of the population values is necessary, for 
example, for the allocation of sampling resources among strata (see, for example, Cochran 
1977, pp. 96-99 and 331-332). 

The rigid assumptions inherent to these sampling designs and population models often are 
not satisfied due to the discrepancy between the population under study and the (possibly dated) 
prior information. Seeking to appropriately handle this discrepancy, Vardeman and Meeden 
(1984) have introduced a pair of estimators which combine information on stratum member- 
ships, stratum sizes, and stratum averages with analogous information gained from the cur- 
rent sample. Their two estimators apply to two essentially different situations. The first is where 
the prior information is global only, i.e., only on stratum sizes and averages. The second 
estimator applies where there is also partial information on individual stratum memberships. 
Here, the population is stratified according to various factors, some of which are known and 
some of which, though not known, may be inexpensive to determine on a first phase of 
sampling. 


! Donald B. White, Department of Statistics, State University of New York at Buffalo, 249 Farber Hall, Buffalo, 
New York 14214. 


106 White: Estimation Using Double Sampling and Dual Stratification 


As an example, consider the use of sampling to determine the spread of an infectious disease. 
If detection of infection is expensive, then stratification, according to risk categories, is desirable 
to reduce the second phase sample size. Factors determining risk categories may include gender, 
age, place of residence, ethnicity, health habits, and contact with potential carriers. As some 
of these factors are not known prior to sampling, the model of Vardeman and Meeden can 
be employed since the true risk categories can be predicted by the known factors. 

Another example is two-phase sampling for non-response. Extending the method of Hansen 
and Hurwitz (1946), we have a population which is divided into two post-strata, i.e., responders 
and non-responders. The methods discussed here apply when there is some prior information 
which classifies units into pre-strata which are then used to predict whether or not the unit will 
be in the group of responders. 

The notion of employment of prior information in two-phase designs is not without prece- 
dent in the sampling literature. As an example, Han (1973) has used prior information on an 
auxiliary regression variable (to be measured in a first phase sample) to construct a simple 
hypothesis (say Hp) regarding the mean of that variable. The first phase sample measurements 
are then used to test Ho. If Hp is accepted, the value specified by Hp is used in the estimator; 
if it is rejected, the sample average is used. 

A discussion of the use of the first estimator of Vardeman and Meeden (global informa- 
tion only) can be found in White (1987). There, optimal choices of the weighting constants 
for prior information relative to the information contained in the current sample were 
determined. Here, the situation considered is where prior information is also available on 
individual stratum memberships. After introducing the necessary notation in Section 2, we 
explore a simulated example in Section 3. In Section 4, in two different sampling situations, 
unbiased estimators are analyzed in terms of variance, unbiased estimation of the variance, 
and optimal allocation of sampling resources. In Section 5, applications of these techniques 
are discussed. 


2. THE POPULATION MODEL AND SAMPLING SCHEME 


We now present the population model and the proposed sampling design. We begin with 
a finite population P of units labelled 1, 2, ..., N with associated unknown values y,, ¥2, 
.., yy. Denote the population total by7 = N .y;. For 1 < i < N, unit i also possesses 
an unknown post-stratum membership j;, 1 < j; < J, and a known pre-stratum membership 
kj, 1s kj < K. 

A variety of population quantities require a specialized notation. Such quantities include 
sizes of groups, group averages and group variances. Subscripts will identify the group 
involved: no subscript implies reference to the entire population, ‘‘k-’’ refers to pre-stratum 
k,1< k < K, ‘‘-j”’ refers to post-stratum /, 1 < j < J, and the subscript ‘‘ky’’ refers to 
the intersection of pre-stratum k with post-stratum j. The base symbols N, Y and S* refer 
to number of elements, y-average, and finite population variance, respectively. Also, we let 
P, Px., P.; and P,; denote the subsets of P corresponding to the four categories given above. 
For example, we have 


1 = 
sp ets ee ee 
Nee 6. beep 
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Also, we can write 


T= ye N.;Y.;. (1) 
J 


We finally let Wij = Ni /Ng., ie., W,,; is the proportion of units in pre-stratum k which fall 
into true stratum /. 

We now discuss the sampling technique. In the first phase of sampling, a stratified simple 
random sample without replacement s’ is selected, with nx. units (first phase sampling frac- 
tion denoted by fi. = nj. /N;,.) selected from pre-stratum k. Samples from different pre- 
strata are independent. For thesen’ = Y,nj{. units, post-strata, j;, are observed. Following 
the notational pattern given above, we let ny; denote the number of units ins’ sampled from 
pre-stratum k which happen to fall in post-stratum /. Also, n/ j = Leng is the total number 
of units in s’ which fall in post-stratum /. This set of units is denoted by s’;. These quantities 
are observed, while quantities involving y-values, such as y’ and s*’ (with all four types of 
subscripts), remain unobserved. Here, and in the following, the average of any empty collec- 
tion is taken as zero, and, if the size of a group is one or zero, we take its variance s” to be 
zero. We note that for 1 < k < K, the random vectors (Ny, ..-, Ng) are independent with 
each possessing a multivariate hypergeometric distribution. 

For the second phase of sampling, we partition s’ into U ji 15/;, 1.e., by post-stratification. 
For each j, let v;(-) denote a known function on and into the non-negative integers with 
uj(0) = Oand1 < uj(x) =< xifx = 1. Thesecond phase sample s is also stratified, but now 
is a subsample of s’ and stratified according to the post-stratification. The sample from s’ : 
is denoted s.; and is of size n. j = u;(”/;). Here, y-values are observed, yielding quantities such 
as Y.; and Spe the y-average and finite population variance of the units in the phase two sample 
and stratum /. 

The estimates of 7 given by Vardeman and Meeden include the option of inclusion of prior 
guesses for the relative stratum sizes within each pre-stratum and for the stratum averages. 
Thus, we have prior guesses for the values W,,; and Y. j Which are given by II, and p. j» Tespec- 
tively. In the estimator introduced below, these guesses are given weighting constants which 
reflect the confidence in the guess relative to the confidence in the corresponding information 
yielded by the current sample. For each k, the confidence value allotted to the collection 
(Iki, ..-, Ik) is denoted M,. € [0,0] and for each J, the confidence value given to p. vis 
denoted M. j € [0,0]. In the current sample, the collection ( Wi, ..., Wiz) is estimated by 
(Mj, /Ng., ..., Ngz/Ng.) and is based on a simple random sample of size nj. . Thus, the con- 
fidence in II,;, say, as opposed to nyj/n., is reflected by the size of M,. versus that of nj.. 
Similarly, in the current sample, Y. j 1s estimated by j.; and is based on a sample of size 7. ip 
thus, the relative confidence in the prior guess and the current estimate is reflected by the relative 
sizes of M.; and n. j- Any confidence weight for prior information equal to zero corresponds 
to no use of the prior information, and, as in the use of stratum sizes in the usual post stratifica- 
tion model, a value of infinity implies no use of the corresponding information in the current 
sample. 

Using the prior guesses, current estimates and confidence weights, we estimate Wj; and Y_; 
by Ik; = (M,.Ty te nj) / (Mx. Fr ny.) and pj = (M_,; Bj + N.j¥.;)/(M.; ata n.;); respec- 
tively. Finally, an estimate 7 of the population total 7 is constructed by replacing in the for- 
mula (1) for 7 any unobserved quantity by its estimate given above. Thus, we employ 


wi K 
j= fray, ates (Mpeg) hpi ee CN oe ni JB}. (2) 
j=1 


k=1 
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Computation of the bias and variance of 7 in the general case is left open by Vardeman and 
Meeden. The case K = landM,; = 0,1 <j < J, has been studied in White (1987). Before 
proceeding to a result in a more complex situation, we first explore the results of a simulation 
on a hypothetical population. 


3. A MONTE CARLO STUDY 


Here we present a specific population and sampling scheme which is modelled after the 
introductory example regarding estimation of the spread of an infectious disease. For a popula- 
tion of 10,000 individuals who are susceptible, the disease is assumed to be more prevalent 
among the 5,000 who live in the western section of the area considered. Since this is a known 
characteristic, the population is partitioned according to the east-west boundary into K = 2 
pre-strata. Next, we assume that certain easily obtained additional information enables the 
sampler to categorize the individual as low, medium, or high risk for becoming infected. See 
Table 1 for the details of the construction of the population. 

For estimation of the total number infected (r = 2302), we assume no prior knowledge of 
the stratum proportions Y.,, Y.., and Y.; and thus take M.,; = M.. = M.; = 0. There 
remain four major ingredients to the estimation process: 1) the prior guesses {I],;:k = 1, 2, 
j = 1, 2, 3} for the distribution of individuals from pre-strata to post-strata, 2) the 
weighting constants M, and M, given to these prior guesses, 3) the first phase sample design 
and outcome, and 4) the second phase sample design and outcome. These are detailed in the 
following. 

First, in White (1987) it was found for the K = 1 case that an effective choice of weighting 
constants was to select M equal to the sample size on which the previous information was based. 
Following that notion, we allowed, for each simulation, the collection {II;;} to select itself 
through a preliminary sample of size m (either 500 or 2500) from each pre-stratum. That is, 
I],; is taken to be the proportion of the m individuals from pre-stratum k falling in post- 
stratum /. 

Second, for each run, the weighting constants were taken as M,. = M). = M for all M 
€ {0, 100, 200, 300, ..., 10,000, 0}. Recall that M = o corresponds to the situation of the 
usual post-stratification where no use is made of the current sample to estimate group sizes. 

Third, the first phase sample is stratified according to pre-strata with sampling fractions 
fj. taken to be fj. = ff. =f, f € {.10, .20, .30, .40, .50}. Recall that in this phase of 
sampling, only post-stratification is observed. This information is, presumably, inexpensive 
to obtain. 


Table 1 
Number Infected/Group Size for the Pre-strata and Post-strata Combinations 


Risk 
Group Low Medium High Total 
Location of i 1 2 3 


Residence 


East (kK = 1) 40/4000 80/800 100/200 220/5000 
West (k = 2) 2/200 80/800 2000/4000 2082/5000 
Total 42/4200 160/1600 2100/4200 2302/ 10000 
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On the other hand, sampling a unit in phase 2, where the presence of infection is determined, 
is assumed to be rather expensive. The individuals selected are a subsample of the phase one 
sample, stratified according to post-strata. The sampling fractions in various strata are again 
taken as equal (uj(n7;) = [cjn/;] for n/; large enough, and c; = c, = Cc; = c) and so that 
different simulations can be compared, c is selected so that the fraction of the entire popula- 
tion which appears in the phase 2 sample remains constant at .10. 

Now, the following process is repeated R = 50,000 times: obtain a preliminary sample of 
size m from which prior guesses Il,; for W,; are constructed. Next, a sample, stratified 
according to pre-strata with sampling fractions f, is obtained. Only post-stratification is 
observed. Then, a subsample, stratified according to post-strata with sampling fractions c, is 
obtained and units in this sample are classified as infected or not infected. Finally, on each 
run, 7 is obtained for each value of M considered. The standard error of 7 is estimated using 
the R simulated values of 7. Recall, however, that in a real-life application, the standard error 
of an estimate will depend on the particular values of II,; used; here, these values are different 
on each run and thus the estimated standard error should be viewed as a long run average for 
a mixture of distributions of 7, mixed according to the distribution of the Il,; based on the 
preliminary sample. 

The simulations were performed on an IBM3031 computer. For this example, where yj, 
€ {0,1} for alli, all random quantities are functions of independent hypergeometric or multi- 
variate hypergeometric variables. Using the fact that the conditional distribution of a univariate 
marginal of a multivariate hypergeometric distribution given any subcollection of the other 
coordinates is itself hypergeometric, all random quantities were simulated using the IMSL 92DP 
hypergeometric simulation subroutine GGHPR. For the first combination of m and F (500 and 
.10), the simulation process was repeated five times to check internal consistency. 

Tables 2 and 3 summarize pertinent characteristics of the variation of the simulated SE(7) 
as a function of M for the five repeated simulations (Table 2), and the simulations for various 
values of f and m (Table 3). Table 2 gives only highlights which demonstrate internal con- 
sistency and confirm that the number of repetitions is chosen large enough. Note that Mp 
denotes the value of M for which SE(7) is minimized. In Table 3, also given is a comparison 
with the better of the possible usual techniques (regular two-phase or stratified according to 
pre-strata) relative to the ideal where the true strata are regarded as known. The standard 
error of an estimator based on stratified sampling using pre-strata only is 113.27, and for 
stratified according to true strata, it is 105.47. Thus, letting the estimator in regular two-phase 
sampling be denoted by 7, and realizing that SE(7,) depends upon f and c, the values 
appearing in the columns headed Percent Relative Reduction in SE(7) are 100 [min(SE(7,), 
113.27)] — SE(#)/[min(SE(7,), 113.27) — 105.47]. 


Table 2 
Key Features of the Repeated Runs with m = 500, f = .10 andc = 1.0 


———OOOO 


SE(7) 
Run # Mo 
M = M = M=M) M =o 
a ee ee eee 
1 600 113.55 109.67 109.62 112.00 
ny 700 113.42 109.50 109.45 111.80 
a 700 113.92 109.86 109.78 112.00 
4 600 113.61 109.71 109.66 112.07 
5 600 113.56 109.74 109.70 | ps 
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Table 3 
Key Features of SE(7) as a Function of M 


SE(7) Mo M = Mo Me=0neVis—a71ee VM "200 


Percent Relative 
Reduction in SE(7) 


= 3.6 46.2 16.3 
54.5 67.9 32.7 
56.1 62.1 22.4 
65.6 72.0 24.8 
67.7 71.7 17.1 


IA IA IA IA IA 


A variety of important results can be discerned from Table 3. First is that for m = 500, 
Mois very close to, although always slightly larger than, m. This is the result predicted by the 
K = 1 situation from White (1987). For m = 2500, though in every case My > 10,000, one 
discovers that SE(7) at M = m is very close to the minimum at M = Mp. 

Second is that at M = m, the percent relative reduction in SE(7) ranges from a minimum 
of 46% to over 90%. Also, at M = 0, corresponding to the situation of dual stratification 
with no prior information on any population characteristic, the percent relative reduction in 
SE(7) is always over 50% except in the case of the smallest first phase sampling fraction, 
Jf = .10. In that case, when prior information is not available and the first phase sample size 
is small, one is better off to use the pre-strata and ignore the true stratification. On the other 
hand, if one does have a set of prior guesses available for the collection of W,;, but is uncer- 
tain of what weights to attach to these values, one could use the usual post-stratification notion 
of using weight M = oo. If the prior information is good, as in our case m = 2500, then the 
percent relative reduction in SE(#) is always over 80%. Even if the prior information is only 
moderately accurate, asin the case m = 500, the reduction in standard error is between 16% 
and 33%. 

In summary, if one is able to identify a weighting constant applicable to prior information 
on the distributions of units among strata, then a substantial reduction in standard error can 
be obtained using these methods. Even if one cannot identify such a constant or does not have 
applicable prior information, one can still decrease standard error using dual stratification by 
taking M = Oif the prior information on W,, is either poor or non-existent, or M = © with 
accurate prior information. In particular, it thus turns out that the case M = Ois important. 
This case is examined in detail in the next section. 


4. BIAS, STANDARD ERROR, AND OPTIMAL ALLOCATION 
WITH NO PRIOR INFORMATION 


When no prior information is available, we set M; = Oand M, = Oforeach1 <j < J 
and 1 < k = K. In this section, we at first also assume that sampling in both phases is 
proportional to the size of the group from which the sample is drawn, that is, for each 
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k, ng. = fNg. (i.e., f%. = f, all k) and for each j, n.j = cn/; (1.e., v(x) = cx, all /). This, 
of course, immediately introduces an approximation (referred to in what follows as approx- 
imation A1), since the resulting sample sizes are not necessarily integers. However, in reasonably 
large populations, and for reasonably large sampling fractions fand c, this approximation has 
little impact on the derivations that follow. 

In this situation, ~.; reduces to J. j and Il, j Teduces to mg;/nz. and, thus, we have 
FH 1/f¥ yay n‘;¥.;. The derivations of the expectation and variance of 7 are summarized 
in the appendix. The key features are two conditioning arguments: first, we condition on s’ 
since the second phase sample is a function of s’ and, second, because of the multivariate 
hypergeometric nature of the phase one sample, we condition on the values nj, the sizes of 
the various pre-stratum and post-stratum combinations in the first phase sample. 

In the appendix, we show first that 7 in this case is unbiased (aside from approximation A1) 
and that an approximation of its variance is given by 


= —— N;z.S%. + N.,; S<;. 3 
var(7) 7 yy k Sk fc Xu pou} (3) 


As discussed in the appendix in more detail, formula (3) 1) gives answers close to the simulated 
values, 2) is based on approximations whose error is small for large populations and reasonably 
large samples, and 3) reduces to the exact formula in all three of the standard situations. In 
addition, it is easy to show that the variance given by (3) is always smaller than that of the situa- 
tion of regular two phase sampling. 

Now as in any stratification model, there is a question of optimal design. The problem 
addressed here is that of minimum variance given a fixed cost. To this end, we let JT; = 
Y.Ny.S2. and T> = WANG S?, . We assume, for the design question at hand, that these are 
known. In reality, of course, only guesses are available. Next, we let D denote the total budget, 
do, the start-up cost, d,, the cost per unit in the phase one sample, and d,, the cost per unit 
in the phase two sample. Letting D, denote the number of dollars available for sampling per 
population unit, we have 


pgs 
D, = as = f (d; + cd). (4) 


With f and c subject to constraint (4), we seek to minimize (3), var(7), now given by 


var(#) ~ Ege Tee Me stray (5) 


The solution is easily found to be given by 


aT: 
pode: Poo Ya (6) 
Cr ane 


with f found using (4). If 7, < 72, we automatically take c = 1 since then the pre- 
stratification is more effective than the post-stratification. 
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In the case of non-proportional sampling, the estimator given is biased and calculations of 
the bias and standard error in this more general situation are prohibitive. However, a slight 
modification of the second phase sampling design along with the associated change in the 
estimator 7 yields an estimator which is unbiased. Following a description of the required 
modification, we compute the variance and an unbiased estimator of the variance and we find 
an optimal method of allocating sampling resources to the various pre- and post-strata. 

The modification to the sampling plan is to leave the second phase sample within pre-strata 
rather than pooling within post-strata across pre-strata. Thus, given mz; units appearing in 
s’MP,;, we have a function u,; (-) (like v.; (-) in Section 2) which defines a sample size 
Ny = Ug (Ng) = Cyjng to be taken by simple random sampling from s’  P,;. Based upon 
this sample, we obtain the quantities p,; and Shy which were defined in Section 2. The estimator 
isnow7 = Vx l/fe. Lj My Ixy - 

Now, since samples (and thus estimators) are independent between pre-strata, 7 is the sum 
of independent estimators of the K pre-stratum totals, where each estimator is based on a regular 
double sampling scheme. Thus, the results of Rao (1973) apply to each pre-stratum and we 
first observe that 7 is unbiased because its summands are unbiased estimators of their respec- 
tive pre-stratum totals. Second, using Rao’s results, we have 


1 
var(7) = jo SOME Sige toy Ney Si Alcea |. (7) 
kite j 


Also, an unbiased estimator of var(7) is given by 


re i, Nigel coo Neel nist 
var(?) = Ne | (Ne -1)y S— ea are) ob 
k j ye = 1 .-1 


Nx. = Nk. Nk ( Nyy i 
. Mma es — (Iq - “pe (8) 
Nea(nih— 1) Lu AL ONGS yp npr 


of 


We note at this point that in the case of proportional sampling considered earlier in this 
section, we have proposed two different estimators for 7, one based on a pooled second phase 
sample, the other unpooled. In both cases, the estimator was found to be unbiased, and, also, 
reduction of formula (7) to the case where fy. = f for all kK and where Cyy = c for all k and 
all j yields formula (3), i.e., the approximate variance for the pooled second phase sampling 
estimator. 

Finally, again following the results in Rao, we derive an optimal allocation of sampling 
resources. Say that D dollars are available for the two phases of sampling, where sampling 
a unit in phase | from P,. costs dj. dollars and sampling a unit in phase 2 from P. j costs d.; 
dollars. Given these costs, we wish to find the values of f¢. and c,; which minimize the variance 
of 7. Using the Cauchy inequality for the phase 2 sample in each pre-stratum, we observe that 
no matter what the value of fj., the sampling fraction from post-stratum / is given by 


dj. 
Gp SSy ae om, (9) 
dj (Sk. —)) Wg Si) 
J 
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Now, the effective expected cost (over both phases of sampling) for each unit sampled in 
phase 1 and in pre-stratum k is given by 


a? = ap} aig Wj Cy dj. (10) 
J 


When viewed in this way, for cost considerations, the first phase of sampling can be seen as 
a regular stratified sample with (effective) cost of a unit sampled in Px. given by (10). Thus, 
Cochran (1977,p.97) provides the required formulation of the first phase allocation: 


ne. Ne. S;./ Jd aig 
n’ »D. Nee. Spr. Jd) 
2 


where 


Nx. Sx. {df 


Ne. Sea 
= 


(12) 


n= yim. =DY 
ig k 


Following the modifications suggested by Rao, one can handle the situation where one or 
more of the cy; turn out to be greater than one. One can also modify the results in the usual 
way to minimize sampling cost in the case of pre-determined variance. 


5. APPLICATIONS 


One can employ the method of dual stratification presented here at two levels. At one level, 
double sampling with pre-strata can be employed with no use of prior information on stratum 
sizes or stratum averages. At a more complex level, if one has in hand prior information on 
the number of units in each stratum coming from each pre-stratum, and if the sampler has 
a level of confidence for this information, then a further reduction in standard error can be 
obtained by employing this prior information. 

This two phase sampling and estimation technique could be used in the proposed nation- 
wide survey to determine the extent of spread of the HTLV-III (Acquired Immune Deficiency 
Syndrome) virus. The extended incubation period, estimated to be on the average 4.5 years 
(Lui et a/. 1986), makes the survey approach imperative, yet there are psychosocial and finan- 
cial factors which make such a survey extremely difficult to carry out. Thus, methods which 
assist in reducing sample size while maintaining accuracy must be pursued. 

Allen (1984) provides data which suggests a partition of the American population according 
to a variety of factors which can be used to define risk categories. Known factors, which could 
be used to define pre-strata, include age, gender, presence of certain diseases, nationality, 
immigration status, and geographical location. Unknown factors, which could be determined 
via interview, include sexual preference and drug use. Data on the prevalence of HTLV-III 
within various subgroups can be both 1) incorporated into the overall estimate of prevalence 
and 2) used to determine sampling allocations. Such data is available, for example, for blood 
donors (Kuritsky et al. 1986), military results (Redfield and Burke 1987), intravenous drug 


114 White: Estimation Using Double Sampling and Dual Stratification 


abusers in Queens, New York (Robert-Guroff et al. 1986) and male homosexuals in Greenwich 
Village (Casareale et al. 1984/5). Though this prior information can be used to reduce cost 
and increase accuracy, confidentiality and sensitivity/specificity of the HTLV-III test remain 
as significant obstacles which must be addressed carefully before such a study will provide mean- 
ingful results. 
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APPENDIX 


Derivation of Expectation and Variance With No Prior Information and Proportional 
Sampling 


Using the notation given in Section 2, we proceed first with the derivation of E(7). The con- 
ditional expectation givens’ is E(7 | s’) = I/f Yj nj 7. Then, writing 1’; 9!) as Yeni kj» 
we find E(#) = E(E(#|s")) = 1/f Yj Le E(miy EUV | mij)) = WF Ly Te Enis) 
Y,; = 7 since ny; is hypergeometric with sampling fraction f and N;; units in pre-stratum k 
and post-stratum j. Thus, 7 is, in this case, unbiased (ignoring approximation A1). 

Computation of the variance is along the same lines, yet much more technically detailed. 
Only certain elements of the computation will be presented and particular emphasis will be 
placed on the points in the derivation where approximations are made. First, some computa- 
tion using the two phases of conditioning discussed above, yields 


var(E(7 | s’)) = a y Ne.Sk.- (13) 
k 


We next obtain 


/ 
1 — nj 


c is - 
fre dais pugs Es (nig — Wsie + Yn Tes — xp]. (14) 
J J k k 


var(7|s’) = 


Our second and third approximations are to approximate n/;/(n/; — 1) by one (A2) and 
(ng — 1) by ng (A3) in equation (14). We now require the expectation of the first term in 
(14) and find 


J 


1 — 1 — 
e| rae Lu Lu ny sf ~ o- 7 > du Ni Sij - (15) 


In (15), one further approximation (A4) is necessary; we ignore the possibility of ng; =< 1 for 
any k,j. We also require the expectation of the second term in (14). The exact formula turns 
out to be 
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1 - = 4 
es = Ds { Ng (Pig = Fj)? + ay y) Sh — a.) (16) 
J k k 


where a, = 1 — f — E[ng(1 — nj;/Ny)/n‘] and a, = El (Yen (Pig — ¥;))?/n/)1. 
We note first that | @; | < 1 and thus when combined with Ni in (15), it can be ignored 
(approximation A5). Also, if in a, n’ ; 1s approximated (A6) by its expectation, fN. j» since 
El Ye mg (Yigg — Y.;)] = 0, we have 


u r Nxj = wt 
a, = 7N, var( du nig (Yg — a) zz (l— Sf) du N: (1 — Wy) (%y - ¥.;)? 
where we have finally approximated (N,. — 1) by Nx. (A7) in computing the variance of the 
hypergeometric variable ny; . When compared to the similar term with coefficient Ni in (16), 
we discover that a2 itself is approximately negligible. Finally, once again ignoring differences 
between N;; and (Miz — 1) or between N. j and (N.; — 1) (approximation A8), (15) and (16) 
can be combined to yield 
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Combining (13) and (17), we finally obtain 
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R 1 meas 2 
var(7) = —— yy Nx. Si. ar 
or e 


—~ JI Nj 8%. (18) 
Cc ; 
df 


The validity of this approximation rests on three facts. First, when (18) is evaluated in the 
five examples for which simulated data exists, the results compare very favorably. The approx- 
imated standard error given by (12) is 113.25, 108.97, 108.09, 106.77, and 106.32 for f’ = .10, 
.20, .30, .40, and .50, respectively. These values are nearly equal to those in Table 3 and the 
column giving SE(7) and M = 0 with m equal to 500 or 2500. Second, the error introduced 
by each approximation made was analyzed and found, with the possible exception of approx- 
imation A6, to be negligible in the case of relatively large population and sample sizes. Even 
in the case of A6, the law of large numbers indicates that ’ ; will be well approximated by its 
expectation if the sample sizes are reasonably large. Finally, as described in the following, this 
approximation formula reduces to the exact formula in all three standard situations. First, this 
situation reduces to the usual stratified sampling according to pre-strata when we take J = K, 
P.; = P,. forj = k,andc = 1. Here, formula (18) reduces to var(7) = (1 — f)/f OENx. 
S;. which is well known to be the exact formula. Also, the estimation scheme described 
reduces to the usual two phase sampling for stratification when we take K = 1 and (18) again 
reduces to the exact formula (see Cochran 1977, p. 329). Similarly, we obtain the situation of 
regular stratified sampling by post-strata if we take f = 1 (here, K and the pre-stratification 
become irrelevant), and formula (18) again reduces to the exact value. 
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ABSTRACT 


The National Farm Survey is a sample survey which produces annual estimates on a variety of subjects 
related to agriculture in Canada. The 1988 survey was conducted using a new sample design. This design 
involved multiple sampling frames and multivariate sampling techniques different from those of the 
previous design. This article first describes the strategy and methods used to develop the new sample design, 
then gives details on factors affecting the precision of the estimates. Finally, the performance of the new 
design is assessed using the 1988 survey results. 
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1. INTRODUCTION 


The National Farm Survey (NFS) is a probability-based sample survey focussing on several 
subjects related to agriculture in Canada. It is conducted annually in June and July in all pro- 
vinces except Newfoundland, where a separate survey is carried out. 

The previous NFS sample design, dating from 1983, was based on the results of the 1981 
Census of Agriculture. A description of it may be found in Ingram and Davidson (1983). How- 
ever, since 1981 the farm population has changed significantly, reducing the effectiveness of 
this design. Furthermore, the requirements of the survey have changed somewhat over the years, 
resulting in the need to update the samples. 

A new sample design was therefore developed based on the results of the 1986 Census of 
Agriculture, and became operational in the summer of 1988. 


2. OBJECTIVES OF THE SURVEY 


The primary objective of the survey is to provide timely, reliable estimates of levels and 
annual trends for over 100 agriculture variables. Essentially, these variables may be divided 
into three categories: cropland areas for the current year; livestock numbers on July 1; and 
receipts and operating expenses for the previous calendar year. In terms of reliability, the objec- 
tive of the survey is to obtain coefficients of variation (CV) below 5% at the provincial level 
for the major parameters. 

Survey data are normally summarized to the provincial level. However, primarily for analysis 
purposes, results for sub-provincial regions are also produced using domain estimation 
methods. 

Another important objective of the survey is to obtain a master sample from which sub- 
samples are chosen for use in other farm surveys conducted by Statistics Canada. 


1 ©, Julien is a methodologist with the Census Data Quality and Analysis Section, Social Survey Methods Division, 
Statistics Canada, Ottawa, Ontario, K1A OT6; F. Maranda is chief of the Agriculture Survey Methods Section, Business 
Survey Methods Division, Statistics Canada, Ottawa, Ontario, KIA OT6. 
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3. TARGET POPULATION AND SURVEY POPULATION 


The target population includes all farms in the provinces surveyed which received $250 or 
more from the sale of agricultural products during the 12 months preceding the survey. Also 
included are farms which do not meet the $250 criterion at the time of the survey, but which 
expect to earn at least this sum during the 12 months following the survey. Such farms, which 
either began operating just prior to the survey or are temporarily inactive, are relatively few 
in number. 

The survey population, or the group from which the sample is selected, excludes farms 
operated by institutions as well as those located on Indian reserves or settlements. The terms 
institution, Indian reserve and Indian settlement are defined in Statistics Canada (1987, pp. 
115-117, 145, 152). The cost-benefit ratio associated with collecting data on these types of farms 
is very high. Because of this, they are excluded in order to enable more efficient use of the 
resources available for the survey. The contribution of such exclusions to national agricultural 
production is small and is estimated using adjustment factors which are based on Census data. 


4. SAMPLING FRAMES AND THEIR USE 


In theory, the survey population is divided into two groups, the first of which includes the 
farms enumerated in the Census and the second all other farms. These include the undercoverage 
from the Census and so-called new farms, that is, those which began operating after the Census. 

The first group is covered all or in part, depending on the province, by one or two list frames 
created from the list of census farms. To complement the list frames and ensure complete cov- 
erage of the survey population, an area frame, created from the agricultural enumeration areas 
(EAs), is used. An enumeration area is the geographical region enumerated by a census represen- 
tative. Furthermore, an EA is said to be agricultural if it contains at least one census farm. 
An area frame is needed to compensate for the shortcomings of the list frames, particularly 
their difficulty to identify new farms. 

The estimation requirements of the survey and the characteristics of agriculture in Canada 
vary by region. To better account for these variations, the territory covered by the survey is 
divided into three regions and a different sample design is used in each one. The three regions 
involved are: the Prairie provinces and the Peace River district in British Columbia; Quebec 
and Ontario; and, finally, the Maritime provinces and the rest of British Columbia. The first 
of these regions is called the Canadian Wheat Board (CWB) region, since the entire region comes 
under the jurisdiction of this organization. 

The total sample size in each of the three regions is essentially based on the overall budget 
available for data collection. Within each region, sample allocation among the various pro- 
vinces and, where applicable, among the various frames, depends on several factors. The 
primary ones are the square root rule applied to the size of the survey population, historical 
allocations in the survey, and the results of various analyses centred on the expected precision 
of the estimates. 


4.1 The Canadian Wheat Board Region 


In this part of Canada, two list frames and one area frame are used in each province. 

The first list frame (L1 list) essentially includes the large and medium-sized census farms 
in relation to key crop, livestock and expense variables. This list is obtained using an iterative 
process which consists in establishing a threshold for each key variable and including in the 
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list all farms that exceed at least one of these thresholds. Each threshold is adjusted separately 
upward or downward so that the L1 list, once completed, includes approximately 35% of the 
survey population’s farms and accounts for 50% to 90% of the total agricultural activity, 
depending on the key variable in question. These percentages are used because experience has 
shown that the resulting list is composed of farms which, individually, are more stable over 
time than the rest of the farms in the survey population. This stability leads to the creation 
of strata which remain homogeneous over the years, which is a factor in maintaining the 
efficiency of the sample design. 

In each province, the L1 list is then stratified within sub-provincial regions based on nine 
key variables. A sample of farms is selected and used to obtain data on crops and livestock. 
Because data on expenses are more difficult and costly to collect, only a sub-sample, called 
the core sample, is used to obtain this information. 

The second list frame (L2 list) includes all census farms with more than 20 acres of cropland 
which were not included in the L1 list. The L2 list is stratified within crop districts based on 
a single key variable, namely, cropland area at the time of the Census. The L2 list is used to 
complement the L1 list for preliminary crop data. These data must be collected within very 
tight deadlines which, for operational reasons, cannot be met using the area frame. 

The area frame includes all agricultural enumeration areas, except those on Indian reserves 
and in the so-called marginal agricultural regions, that is regions with little agricultural activity. 
Marginal regions are found mostly in the northern parts of the provinces and in urban fringes. 
The few census farms located in marginal regions are added to the LI list, since it is the only 
list used to collect data on all survey variables. 

The area frame is stratified using the same sub-provincial regions and key variables as the 
L1 list. It ultimately produces a sample of segments which are delineated on topographic maps. 
The identity of the farmers operating land in one of these segments is obtained through on-site 
enumeration. Manual matching of names and addresses then enables detection of segment farms 
overlapping one of the list frames. This detection is essential because each time the area frame 
is used to complement a list frame, only those segment farms that do not overlap the list in 
question are used, thus ensuring that the list and area frames represent mutually exclusive 
domains. 

Complete information is required on all segment farms except those overlapping the L1 list, 
as the data for this list are obtained from the sample selected from it. 


4.2 Quebec and Ontario 


In each of these provinces, a single list frame, called L1, and an area frame are used. 

The list frame is composed of all census farms in the survey population. The methodology 
used in sampling from this list is similar to that used for the CWB region L] list, apart from 
two differences. First, incorporated farms, or farms founded as business corporations, are 
separated from the other farms, and strata are created independently within the two groups. 
This preliminary separation is performed because only incorporated farms are required to report 
their expenses in the survey, since the expenses of the non-incorporated farms are obtained 
from Revenue Canada tax records. It should be noted that the confidentiality of these records 
is completely protected under the Statistics Act. Second, sub-sampling for expenses is 
unnecessary because less than 25% of the farms in the survey population are incorporated. 

The area frame and its sample design have not been modified following the last Census, 
due to a lack of resources. Only the marginal regions were updated, resulting in their 
enlargement. 
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4.3 Maritime Provinces and the Rest of British Columbia 


In each province of this region, the sample design includes only one list frame, again called 
L1, which is made up of all census farms in the survey population. Given that a list frame tends 
to deteriorate with time and that there is no area frame to supplement it, it becomes more 
difficult to completely cover the survey population. However, because of the relatively small 
number of farms, under 30 000 in these provinces, more complex procedures were implemented 
to keep the list up-to-date. Notably, farms which were missed in the Census or which began 
operating following it may be detected through these procedures. Thus, for all practical 
purposes, the list frame is considered to ensure full coverage of the survey population. 

In each province of this region, the list is stratified and a sample of farms is selected using the 
same approach as in Quebec and Ontario. All the estimates required are produced from this 
sample. 


5. LIST SAMPLING TECHNIQUES 


Samples are taken from the list frames using a one stage, statified sample design where the 
farms constitute the sampling units. The strategy and methods used to develop this design are 
essentially the same, regardless of the province and list involved. However, the combination 
of methods and key variables used may vary from case to case. 

The first step consists in identifying the farms with distinct characteristics and in 
automatically including them in the sample. There are essentially two kinds of these so-called 
self-representative or take-all farms. The first group includes those with a unique operating 
structure such as community pastures and multiholding corporations, while the second group 
contains the farms which clearly stand out from the majority because of their very large con- 
tributions to key crop, livestock and expense variables. Due to the skewness (to the right) of 
the distributions involved, complete enumeration of these farms is an efficient way to reduce 
sampling variance. 

Farms with very large contributions are identified through an intuitively-based rule which 
produced good results in the previous sample design. This rule, called the sigma-gap rule, is 
applied separately to each key variable using all farms having a non-zero value for the variable 
in question. Farms with a sufficiently high contribution to one of the key variables, as deter- 
mined by this rule, are said to be take-all. 

The sigma-gap rule, as adapted to the survey, functions as follows. Given a univariate 
distribution of points x;,i = 1,2, ..., N,x; > 0 for alli, and given o as its standard devia- 
tion, the points are arranged in increasing order x; < x, < ... < xy; for the half of the 
distribution to the right of the median, the distance between each successive pair of 
points d; = x; — x;_, is determined; given i,, the smallest i for which d; = o, all points 
i = i, correspond to take-all farms. If d; < o for all i, no point in this distribution 
distinguishes itself sufficiently from the others to be declared a take-all farm. 

The second step consists in dividing the rest of the farms in the list into take-some strata. 
In most cases, the strata are formed within sub-provincial regions according to nine key variables 
representing the usual three categories: crops, livestock and operating expenses. The number 
of variables in each category is one, six and two respectively. 

The underlying principle to the stratification is as follows. Each farm is characterized by 
nine variables, and neighbouring farms, defined in terms of Euclidian distance, are grouped 
together. Two multivariate clustering algorithms are used for this purpose. These algorithms 
are called FASTCLUS and CLUSTER, since they are available in the procedures of the same 
name in the SAS statistical analysis software package (version 5). 
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The FASTCLUS algorithm divides a set of observations into a predetermined number of 
mutually exclusive clusters. First, the algorithm chooses observations which serve as initial 
cluster seeds. Each observation is then assigned to the nearest seed, and once this is completed, 
the cluster seeds are updated by the means of the clusters thus formed. The process is repeated 
until the changes in the seeds become minimal. The FASTCLUS algorithm is based on work 
by Hartigan (1975) and MacQueen (1967). 

The CLUSTER algorithm groups a set of observations into mutually exclusive clusters in 
a hierarchical structure. Initially, each observation forms a cluster in itself. Based on a technique 
inspired by Ward (1963), the two most similar clusters are combined into one, which subse- 
quently replaces them. The process is repeated until only one cluster remains. Massart and 
Kaufman (1983) provide an introduction to this type of classification. Thus, the set of obser- 
vations is broken down into as many partitions as there were observations to begin with, and 
each partition corresponds to a stratification. 

These algorithms are used successively as follows. FASTCLUS is used first to group the 
farms into 250 clusters, which are then progressively combined to form the strata using 
CLUSTER. Initial classification is performed with FASTCLUS, since using CLUSTER directly 
with a high number of records would require excessive computer time. 

Each of the three categories of variables must contribute equally to strata formation. To 
ensure this, the initial stratification variables are transformed so that the sum of the transformed 
variables in each category has a mean 0 and a predetermined variance, usually 1. The crop cat- 
egory with its single variable may be standardized in the usual manner by subtracting its mean 
and dividing by its standard deviation. In each of the other two categories, two successive 
transformations are performed independently. Given X;, the initial variables of a given cate- 
gory C, a principal components analysis was performed to obtain transformed variables Y;. 
These new variables, with mean p,; and variance o?, are linear combinations of the former ones 
and mutually independent. The Y; are then standardized to obtain final stratification variables 
Z; as follows: 


Yi — pj 


te 
(Ee) " 
i¢C 


Thus, the mean and variance for ¥ i¢¢ Z; are 0 and | respectively. 

An empirical approach is used to determine the number of strata. Several stratifications 
and allocations are performed by varying the number of strata. Then, the coefficient of varia- 
tion curve is drawn as a function of the number of strata for all key variables and many others. 
These curves generally resemble Figure 1. Stratification gains are considered to have been vir- 
tually fully attained at the point where the majority of curves are practically horizontal. The 
number of strata chosen is a compromise between this point and the desire to avoid forming 
too many strata so as to attenuate the effects of incorrect initial classification and stratum 
jumpers over time, two major causes of outliers or influential observations. 

Sample allocation is multivariate and is generally carried out using the same key variables 
used for stratification. The allocation algorithm consists in minimizing a linear combination 
of the square of the coefficients of variation of the key variables, within the constraint of a 
fixed total sample size. Given c;, coefficient of variation for a key variable, a; > 0 as constant 
and n, total sample size, ¥ a;c? = f(n) must be minimized within the constraint n = n,. The 
algorithm used is described in Bethel (1986). Adjustments are then made to obtain a minimum 
sample size of 4 and a maximum weighting factor of 50 in each stratum. 
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Figure 1. General Curve of the Coefficient of Variation as a Function of the Number of Strata 


Finally, once allocation has been completed, the farms are sorted within each stratum by 
sub-provincial region and total operating expenses and a sample is selected using circular 
systematic sampling. For the L1 list in the CWB region, the complete sample is chosen first; 
the core sub-sample is then selected from it using circular systematic sampling. 


6. AREA SAMPLING TECHNIQUES 


Area samples are selected according to a two-stage stratified sample design. The Census 
enumeration areas and segments represent the primary and secondary sampling units 
respectively. 

Given that the area sample design has not been modified for Quebec and Ontario, the 
following paragraphs apply only to the CWB region. 

The first step consists in measuring the agricultural activity in each of the frame’s EAs by 
summarizing to the EA level the data for the census farms not included on the L1 list. Excluding 
the L1 list farms from the summarization process produces EA distributions which accurately 
reflect the characteristics of small farms. Subsequent use of these distributions enables an area 
sample complementing the L1 list with respect to small farms to be selected with greater 
efficiency. 

Once the summarization process has been completed, each EA is treated as a farm for 
sampling purposes. The EA selection strategy and methods are very similar to those applied 
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to the CWB region L1 list. First, take-all EAs are determined using the sigma-gap rule. The 
remaining EAs are then allocated to take-some strata within sub-provincial regions using the 
CLUSTER multivariate clustering algorithm. Preliminary classification with FASTCLUS is 
unnecessary in this case due to the relatively low number of EAs, never more than 3000 per 
province, to be processed. Furthermore, the usual standardizations suffice for transforming 
the key variables. A principal components analysis was not used because the area frame’s con- 
tribution to provincial estimates does not justify such an approach. 

Allocation to strata is performed with the same algorithm used for the list, and the minimum 
sample size is again established at 4. The sample size is then divided by four in each stratum, 
and four separate replicates are selected using circular systematic sampling. Replicates facilitate 
variance calculation, as a single secondary unit is often chosen per primary unit. 

Once the EAs have been selected, their boundaries are traced on topographic maps and they 
are divided into segments of approximately 7.5 km? (3 mi”). Natural boundaries such as roads 
and rivers are used as much as possible to facilitate the work of field interviewers. Simple 
random sampling without replacement of the segments is performed at a minimum rate of 1 
out of 30 in each selected EA. There are, however, some exceptions to the rule: additional 
segments are taken so that the overall weighting factor does not exceed 180; a minimum of 
two segments are selected in each EA belonging to the strata subjected to first-stage complete 
enumeration; and, finally, when the same EA appears in more than one replicate, measures 
are taken to avoid selecting the same segment more than once. Nevertheless, these exceptions 
are rare. 


7. RESULTS OF THE SAMPLE DESIGN 


Table 1 contains the results of the list frame sample design. The following items are included: 
the number of farms in the list (NV); the number of strata (H/); the number of farms in the sample 
(n); and, finally, the number of farms in the core sub-sample (v-core) in those provinces where 
it applies. 


Table 1 
Results of the List Frame Sample Design 


L1 List L2 List 
Province SEO a= ee ee eee 
N H n n-core N H n 

P-BsIi 2,830 26 451 

N.S. 4,273 35 550 

N.B. 3,544 39 498 

Quebec 41,380 80 6,096 

Ontario 72,598 78 8,401 

Manitoba 6,712 48 1,364 490 18,058 29 2,267 
Saskatchewan 15,668 48 3,625 1,106 45,798 41 4,573 
Alberta 13,928 63 2,981 909 38,504 25 2,973 
B.C. (Peace)@ 494 oh} 190 190 1,187 6 170 
B.C. (rest)> 17,042 41 1,999 

Total 178,469 479 26,155 2,695 103,547 101 9,983 


2 Peace River district in British Columbia. 
b British Columbia minus the Peace River district. 
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Table 2 contains the results of the area sample design in those provinces where such a design 
is used. The following items are indicated: the number of EAs in the frame (NV); the number 
of strata (7); the total number of EAs sampled (n); the number of EAs sampled where each 
EA is counted only once when it appears in more than one replicate (m-once); and, finally, the 
number of segments chosen (mm). 


8. FACTORS AFFECTING THE PRECISION OF THE ESTIMATES 


To better appreciate the results obtained from the 1988 survey, three factors affecting the 
reliability of the estimates must be discussed. These factors are the sample size, the treatment 
of the total non-response and the estimation methodology. 

First, the sample size for the L1 list in the CWB region was reduced by 10% in relation to 
that of the corresponding list used in the previous sample design. This reduction was prompted 
mainly by the desire to lower costs. 

Second, the methodology used to treat total non-response was modified in 1988. Previously, 
when a farm failed to respond to the survey, its data were imputed using the data from another 
farm in the same stratum. These imputed data enabled the sample to be completed to its original 
size. However, in 1988, the cases of total non-response were not imputed; instead only the 
respondent sample was used and the weighting factors adjusted upward. The actual sample 
is therefore reduced in relation to the former method. 

In the 1988 survey, the total non-response rate varied between 2% and 13%, depending on 
the province. The national rate was 10%. Non-response rates are presented in detail in Table 3. 


Table 2 
Results of the Area Sample Design 

Province N H n n-once m 

Quebec 2,065 43 191 182 230 
Ontario 2,687 49 195 185 259 
Manitoba 794 at 201 264 305 
Saskatchewan 1,496 26 328 308 477 
Alberta 1,623 a2 328 319 434 
B.C. (Peace) 54 7 36 32 58 
Total 8,719 178 1,355 1,290 1,763 


4 Peace River district in British Columbia. 


Table 3 
Total Non-response Rate (%) by Province 
Province Refusals No Contact Total 
[PE 0.00 3255 a55 
N.S. 0.00 2.18 De ANS 
N.B. 0.00 1.61 1.61 
Quebec il a7All 6.56 8.27 
Ontario OF ibieartal 13.38 
Manitoba 3.45 4.03 7.48 
Saskatchewan 4.06 6.46 10.52 
Alberta 2.68 7.95 10.63 
B-G: 1.78 10.28 12.06 


Total 2.52 8.11 10.43 
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The last factor to be discussed is the estimation methodology. The usual estimators cor- 
responding to a stratified simple random sample are used for list frames. For area frames, an 
estimator described in Wolter (1986 pp. 19-26) and corresponding to a sample design with 
independent replicates is used. Provincial estimates are obtained by adding the contribution 
of the list and area frames since, as previously mentioned, these two frames are independent 
and represent mutually exclusive domains. Details on the estimation methodology are found 
in Lynch (1988). 


9. ASSESSING THE PERFORMANCE OF THE NEW DESIGN 


To assess the performance of the new design, the precision of the estimates obtained in 1988 
is compared first to that of the 1987 survey, then to the precision anticipated during the develop- 
ment of the sample design. 


9.1 The 1988 and 1987 Surveys Compared 


Two opposite tendencies are in effect in a comparison of the precision of the estimate in 
the 1988 and 1987 surveys. The 1988 estimates should be more precise because the 1987 sample 
design was already four years old. However, the two sample size reduction factors described 
in section 8 would indicate less precise estimates for 1988. 

Precision is compared using the coefficient of variation of the provincial estimates obtained 
by combining the L1 list and area frames. The estimates used are those for several key variables 
whose coefficient of variation in 1987 did not exceed 20%. 

The precision of 234 estimates is compared in the charts in Figure 2, where each square 
represents the CV achieved in 1987 on the x-axis and achieved in 1988 on the y-axis for a given 
estimate. The frequency (as a percentage) of the key variables located within each zone 
delineated by the straight lines Y = X/2, Y = X and Y = 2X is also presented. 

Nearly 60% of crop estimates were more precise in 1988 than in 1987. The majority of those 
that were less precise were so to a small degree only. Close to 95% of livestock estimates were 
more precise in 1988 than the previous year; in fact, 32% of the estimates were even twice as 
precise. Finally, over 60% of operating expense estimates were more precise in 1988. Some 
of the 1987 estimates were a good deal less precise, and 7% were even two times less precise. 
The latter are from Quebec and Ontario, where data on operating expenses are collected from 
incorporated farms only. Further more, the legal status of a farm in these provinces is diffi- 
cult to identify, both in the Census and the survey. 

Despite the reduction in the effective sample due to total non-response and cutbacks during 
the sample design development stage, the 1988 survey generally provided more precise estimates 
for each category of variables. 


9.2 Precision Obtained Versus Precision Anticipated 


The precision obtained is expected to be inferior to the precision anticipated for two reasons. 
First, when the weighting factors are adjusted to account for the total non-response, the variance 
increases slightly. Second, the data used to create the sampling frame were taken from the 1986 
Census of Agriculture. These data are subject to error and the sampling frame deteriorates 
with changes in agricultural activity. 

Precision is compared using the coefficient of variation of L1 list frame provincial estimates 
only. These estimates are for several key variables whose anticipated CV did not exceed 20%. 
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Figure 2. Comparaison of the Precision of Key Variable Estimates in the 1987 and 1988 Surveys by 
Category of Questions. 
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Figure 3. Comparison of the Precision of Key Variable Estimates Obtained in the 1988 Survey and 
the Precision Anticipated during Development of the Sample Design. 
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A comparison of the precision of 288 estimates is presented in chart form in Figure 3. In 
these charts, each square represents the anticipated CV on the x-axis and the obtained CV in 
1988 on the y-axis for a given estimate. The frequency (as a percentage) of the key variables 
located within each zone delineated by the straight lines Y = X, Y = 2Xand Y = 3X isshown 
in the charts. 

For the crop and livestock categories, approximately 90% of the estimates are sufficiently 
precise, given the non-response rate, as most of the key variables are located closer to straight 
line Y = Xthanto straight line Y = 2X. Two tendencies can be seen for the operating expense 
estimates. Surprisingly, the CV obtained is lower than the anticipated CV in 28% of the cases, 
the vast majority of which are found in the CWB region. However, 31% of all estimates are 
more than two times less precise than anticipated. These cases are found in Quebec and Ontario 
for the reasons given in section 9.1. 

Finally, a complementary study was conducted in which the precision obtained was com- 
pared to the anticipated precision based on the size of the sample actually observed. This study 
revealed that the frequency of estimates at least two times less precise than anticipated dropped 
from 12% to 5% for crops, from 9% to 5% for livestock and from 31% to 7% for operating 
expenses. 

These studies show that in general the precision obtained is acceptable and differs from the 
anticipated precision mainly because of the treatment for total non-response. This indicates 
that the sample design is therefore sound and the L1 list frame is adequate. On the other hand, 
less precise estimates were obtained for operating expenses due to a problem in identifying incor- 
porated farms in Quebec and Ontario in the Census and in the survey. Finally, the list frame, 
which was two years old at the time of the survey, was observed to have deteriorated some- 
what due mostly to bankruptcies and farm sales. 


10. CONCLUSION 


In general, survey results were substantially improved following implementation of the new 
sample design. Moreover, the reduction in sample sizes led to cost savings and a considerable 
reduction in the response burden on the farmers surveyed. Difficulties remain, however, 
especially regarding the operating expense variables for incorporated farms in Quebec and 
Ontario. Further studies to resolve these difficulties are being envisaged. 
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Does the Method Matter on Sensitive Survey Topics? 


DAVID A. HAY! 


ABSTRACT 


The effects of utilizing a self-administered questionnaire or a personal interview procedure on the responses 
of an adolescent sample on their alcohol consumption and related behaviors are examined. The results 
are generally supportive of previous studies on the relationship between the method of data collection 
and the distribution of responses with sensitive or non-normative content. Although of significance in 
a Statistical sense, many of the differences are not of sufficient magnitude to be considered significant 
in a substantive sense. 


KEY WORDS: Data collection; Personal interview; Self-administered questionnaire; Response errors; 
Alcohol consumption. 


1. INTRODUCTION 


To “‘questionnaire”’ or to interview that is the question to be answered by researchers in 
the design and conduct of sample surveys on delicate or sensitive topics. The decision on whether 
to utilize personal or telephone interviews or a variant of the self-administered questionnaires, 
or a combination there of, is a critical decision that survey researchers have to make in attemp- 
ting to optimize the quality of the resultant data. 

Encompassed by the more general problems of reliability and validity associated with self- 
reports of attitudes, behaviour and other phenomena of interest to survey practitioners, is the 
question regarding the relative merits of the interview and self-administered formats in 
minimizing or reducing non-sampling biases or errors. In other words, then would different 
results be obtained from the utilization of different modes of data collection (Smith 1975)? 

As far back as 1959, Selltiz et al. (1959) stated that most questionnaires and interviews were 
utilized without evidence of their relative merits. More recently, this position has been re- 
emphasized by Knudsen et al. (1967), Alwin (1977) and Newton et al. (1982) who maintain 
that the selection of the survey mode to be utilized is based on convenience, relative costs and 
other practical considerations rather than on their methodological adequacy and potential 
response effects. The planning of survey research, Newton et a/. emphasize should be deter- 
mined by what is reliably known about the relationship between methods of administration 
and response patterns, rather than just on the issues of relative costs, respondent motivation 
and other similar considerations. 

Some studies which have compared personal interviews with more anonymous formats such 
as self-administered questionnaires or telephone interviews have found minimal and/or 
statistically non-significant differences in the responses to a variety of topics including those 
of a private or sensitive nature (DeLameter and MacCorquodale 1975; Gibson and Hawkins 
1968; Krohn et al. 1974; McDonagh and Rosenblum 1965; Metzner and Mann 1952, Newton 
et al. 1982 and Sykes and Collins 1987.) Other researchers have observed that more candid, 
self-revelatory and informative responses are more likely to be made by questionnaire and 
telephone respondents than personal interviewees on topics concerning deviant, sensitive 
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or embarrassing behaviours and attitudes. (Cannell and Fowler 1963; Ellis 1947; Hubbard 
et al. 1976; Knudsen et al. 1967; Siemiatycki 1979; Whitehead and Smart 1972 and Wiseman 
1972). 

The conclusions of the latter studies were generally based on the untested assumption that 
the increased reporting of deviant, threatening or embarrassing information was more accurate 
(Blair et al. 1977). This point was also emphasized by Schuman (1980) who stated that fre- 
quently no external validation data were obtained, but the researchers ‘‘assumed that the more 
such behaviour was reported, the more accurate the reports - a plausible but not air-tight 
assumption for most of the topics they dealt with.”’ 

The present note is concerned with a further comparison of the relationship between per- 
sonal interviews and self-administered questionnaires and responses obtained from an adoles- 
cent population on a “‘threatening”’ or deviant topic, namely alcohol consumption. The results 
being reported are based on a secondary analysis of data from a study of alcohol-related 
attitudes and behaviors from a sample of teenagers in a Western Canadian province completed 
in 1977-78 (Hetherington ef al. 1978 and 1979). 

The study which utilized both personal interviews and self-administered questionnaires pro- 
vides a unique opportunity to compare the potential effects of the mode of data collection on 
the resultant data. This type of comparison of interest to survey practitioners is generally not 
possible in the majority of surveys which tend to rely on one method of data collection. 

A stratified random sample of 1502 students in grades 6 to 12 was selected from three school 
regions in the Province of concern. The total sample of students was randomly assigned by 
grade to either the self-administered questionnaire or to the personal interview procedure. 
Approximately one half of the students from each grade 6 to 12 were thus allocated to one 
of the procedures. The number of students assigned to be interviewed was 752 with 7 50 students 
being assigned to the questionnaire data collection. 

The questionnaire was group administered by a trained researcher in a room made available 
at each school for that purpose. The interviews were conducted by fifteen interviewers 
specifically trained for the study. 

The survey instrument which consisted of 75 questions was identical in content for both 
the interview and questionnaire data collection procedures. The majority of the questions were 
closed ended and required an average of 20 minutes for completion in both types of adminis- 
tration. 


2. RESULTS AND DISCUSSION 


A comparison of the personal interview and self-administered questionnaire respondents 
on a number of personal and familial characteristics was conducted to determine if the two 
groups differed in respects other than the method of data collection. The results indicated 
that the two groups did not differ by more than could be attributed to chance on variables 
such as sex, age, grade of enrollment, parent’s educational and occupational backgrounds 
and religious affiliation. A statistically significant difference was observed on the variable 
of ethnicity with a higher percentage of Canadian identities reported by the interview 
respondents. 

With the exception of ethnic background, the subsequent analysis was, therefore, based on 
the assumption that the interview and questionnaire respondents were equivalent on a number 
of variables that could potentially confound the comparison of obtained responses to the two 
procedures. 
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Table 1 


Frequency Distribution and Z Probabilities on Selected Questions 
for Interview and Questionnaire Results 


Variabl Interview Questionnaire Two-tailed Z 
dia ra (W752) (n = 750) Probability 

Ever drink 62.63 T3313 .000 

Ever used cigarettes 29.78 37.60 .001 


2.1 Variable Distribution 


A comparison of the mean responses or frequency distributions for the interview and ques- 
tionnaire respondents on a number of questions with non-normative or illegal content lent 
general support to previous research on similar issues. The questions of primary concern are 
those related to the consumption of alcohol which are viewed as possessing a considerable degree 
of threat or deviant content for the population under consideration, the majority (99.8%) of 
whom were under the legal drinking age at the time of the study. 

The frequency distributions in Table 1 indicated that a significantly higher percentage of 
the questionnaire respondents reported ever having more than a sip or taste of an alcoholic 
beverage. Similar statistically significant differentials were observed between the interview and 
questionnaire respondents on reported smoking. 

For those respondents reporting that they had consumed a drink of alcohol, the mean 
drinking levels and average age at first drink shown in Table 2 were also suggestive that the 
questionnaire respondents are more likely to report on deviant behaviour than were their inter- 
view contemporaries. The significantly higher average drinking levels for the questionnaire 
respondents reflects their reporting higher amounts and frequencies of alcohol consumption. 
The significantly higher average age at first drink for the interviewees indicates their reporting 
taking their first substantial drink at an older age than did the questionnaire respondents. 

Significant differentials between the interview and questionnaire respondents were also 
observed on the reporting of parental drinking and on the importance of religion in the home 
questions. The mean values for these three questions indicated that the questionnaire 
respondents reported higher drinking levels for their parents than did the interviewees and that 
religion was perceived as being less important in the homes of the questionnaire respondents. 
While not possessing the same degree of self revelation or threat to the respondent per se, the 
differentials were viewed as suggestive of an attempt on the part of the interviewees to por- 
tray a more favourable or socially acceptable image about their family life. 

However, the greater importance of religion in the home reported by the interviewees was 
not carried through in their self-descriptions of the importance of religion. The statistical 
equivalence of the means values on the importance of religion to self indicated that the inter- 
view respondents were no more likely to report that religion was important to self than were 
the questionnaire respondents. The two groups of respondents were also equally likely to report 
on the drinking habits of friends or peers. 

The response patterns on other questions possessing somewhat different aspects of ego- 
involvement or image favourability did not generally support the potential operation of a social 
desirability effect as was evident for the alcohol related behaviours. As indicated in Table 2, 
the questionnaire respondents reported receiving significantly higher school grades, had higher 
educational aspirations in terms of their future educational plans and reported more positive 
self images on 4 of the 7 self-esteem items and on the composite self-esteem index. Contrary 
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Table 2 


Means, Standard Deviations and ‘‘f’’ probabilities on Selected Questions 
for Interview and Questionnaire Respondents 


LT 


Interview Questionnaire 
aes | (n = 752) (n = 750) Two-tailed ‘‘?”’ 
Variable ——_—__— ——————— Probabilities 
xX SD xX SD 
ied th NE I a I ae oe EET Lente 2 So 8 
Alcohol and Related Behaviour 
i amtinm omepe oboe te TEST a 
Drinking level Deol 2.92 2.76 3.05 .003 
Age at first drink? 3.93 hes 74 3.64 1.39 .001 
Father drinks 1.82 0.62 1.90 0.58 O11 
Mother drinks 1.70 0.50 ia 0.51 .025 
Friends drink 1.92 0.57 1.94 0.56 481 
ii ee ee ae Se a ee 
Educational Variables 
itp celT | 5b pep ash crit ict QOSRAPS CME NS eee 
Grades received 4.37 1.49 4.58 1.46 .008 
Educational plans 3.02 1.24 BA: 1.24 001 
ESF IE se are een OO ee eee 
Religious Variables 
er eee ee ee ee eee ee 
Importance of religion 
in the home oe HW 1.16 6 Hl i: | Ne .000 
Importance of religion 
to student 222 1.12 3tt3 1.18 .130 
Self-Esteem Indices 
Item 1 2.98 0.60 Bal2 0.60 .000 
Item 2 2.96 0.49 3.08 0.54 .000 
Item 3 3.14 0.55 A 0.61 .000 
Item 4 2.98 0.51 2.05 0.57 .033 
Item 5 3.10 0.63 3.01 0.75 .017 
Item 6 2.93 0.56 2.97 0.59 .207 
Item 7 2.07 0.54 gal2 0.60 132 
Composite 217 2.39 21.65 2.85 .001 


Deanne ee eee eee eee ee ee ee ee ee 


4 _ Mean value calculated on grouped data. 

1 Variable Codes: Drinking level; composite index of frequency and volume of alcohol consumed 0 = abstainer to 
9 = frequent consumer of large amount of alcohol. 
Age at first drink: 1 = 6 years or less;2 = 7-8 years;3 = 9-10 years; 4 = 11-12 years; 5 = 13-14 years;6 = 15-16 
years; and 7 = = 17 years. 
Father, mother and friends drink: 1 = never drinks; 2 = drinks sometimes; 3 = drinks a lot. 
Grades received: 1 = mostly D’s and F’s; 2 = Mostly C’s and D’s; 3 = mostly C’s; 4 = mostly B’s and C’s; 
5 = mostly B’s; 6 = mostly A’s and B’s and 7 = mostly A’s. 
Educational plans: 1 = will not finish grade 12; 2 = will finish grade 12 only; 3 = will take technical training; 
4 = will attend university and 5 = will go to graduate or professional school. 
Self-esteem items and index: 1 = strongly disagree; 2 = disagree;3 = agree and4 = strongly agree. The additive 
index for the 7 items ranged from 7 to 28. 


to the expectation that the interviewees would attempt to portray a more favourable image, 
these results tended to indicate that they were more modest in the reporting of school grades 
received, in their educational aspirations and in their self perceptions. However, the greater 
anonymity and potential freedom afforded the questionnaire respondents to more willingly 
report on their alcohol related behaviors may also have resulted in a similar perceived freedom 
to aggrandize their own merits in relation to these questions on school grades, educational plans 
and their self conceptions. 
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However, the presence of a significant distributional response bias between the interview- 
questionnaire data collections is evident only in the statistical sense of the term. The statistically 
significant mean value differences on the questions of concern ranged from 0.05 to a maximum 
of 0.48 on the composite self-esteem index. Given the potential presence of other errors of 
measurement, the interview-questionnaire response differentials obtained in the present study 
are not of sufficient magnitude to be considered as indicative of a response bias effect of 
substantive or practical importance. 

Due to the unavailability of reliable information on the actual drinking habits of the students 
and their parents, the school grades and other responses under consideration, it was not possible 
to conduct an evaluation of the relative accuracy of the interview and questionnaire responses. 
As a result it is not possible to indicate the relative superiority of either the self-administered 
mode or the personal interview for the question responses under consideration. Both types of 
responses may be subject to an under- or over-reporting bias of an indeterminant direction 
and/or magnitude. 

The results of this note are in general agreement with Bradburn and Sudman (1979) who 
indicate that no consistent relationship appears to exist between the method of survey admin- 
istration and the over-reporting of socially desirable behaviour or the under-reporting of socially 
undesirable behaviors and attitudes. As a result Bradburn and Sudman (1979) and Locander 
et al. (1976) suggest that no data collection procedure is clearly superior for all types of 
threatening or other questions of concern to survey practitioners. 
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Use of Cluster Analysis for Collapsing 
Imputation Classes 


E.R. LANGLET! 


ABSTRACT 


The problem of collapsing the imputation classes defined by a large number of cross-classifications of 
auxiliary variables is considered. A solution based on cluster analysis to reduce the number of levels of 
auxiliary variables to a reasonably small number of imputation classes is proposed. The motivation and 
solution of this general problem are illustrated by the imputation of age in the Hospital Morbidity System 
where auxiliary variables are sex and diagnosis. 


KEY WORDS: Item nonresponse; Auxiliary variables; Imputation matrix; Donors; Disjoint techniques; 
Hierarchical techniques; Cluster seeds. 


1. STATEMENT OF THE PROBLEM 


In surveys, the problem of item nonresponse occurs when some but not all information is 
collected for a sample unit or when some information is deleted because it fails to satisfy edit 
constraints. In many surveys, this problem is handled by random imputation within classes, 
a common form of hot deck imputation method. For this type of imputation, a respondent 
is chosen at random within an imputation class defined by one or more auxiliary variables and 
the respondent’s value is assigned to the nonrespondent. 

The problem considered in this paper can be defined as follows. The classifications of the 
respondents according to certain auxiliary variables form a multi-dimensional imputation 
matrix where the number of imputation classes equals the number of cross-classification cells 
defined by the auxiliary variables. If the number of imputation classes is very large, few or 
no donors may be available in several classes. In addition, manipulation of this large matrix 
could be very cumbersome computationally. These problems can be alleviated by collapsing 
the cells of the matrix either by grouping the cells themselves, or the rows, columns or along 
some other dimension (or combination of dimensions) so that the resulting groups will be 
homogeneous with respect to the variables requiring imputation. We propose to use cluster 
analysis to achieve the desired level of collapsing. For this purpose, the values of the variables 
of interest from donors (or respondents) for each imputation class can be used to assign 
numerical scores to each class. In this paper, measures based on empirical distribution func- 
tion for respondent data are used to quantify imputation classes. Cluster analysis can then be 
used to group the cells of the matrix according to these numerical scores. It will be shown that 
cluster analysis is appropriate for the problem under consideration. Related useful references 
concerning the application of cluster analysis to stratify primary sampling units are Drew, 
Bélanger and Foy (1985), Judkins and Singh (1981) and other references contained therein. 

The above mentioned problem arose in the context of age imputation in the Hospital 
Morbidity System (HMS). This system uses the auxiliary variables sex and diagnosis as the basis 
for imputing the age. The number of imputation classes were over 5,000 for each sex. A solu- 
tion based on the technique of cluster analysis was proposed in order to collapse the levels of 
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the diagnosis variable to 40 groups of related diagnoses. In section 2, a brief review of the 
commonly used cluster analysis techniques is presented. Use of cluster analysis for the problem 
of collapsing imputation classes is illustrated for the example of imputation of age for the HMS 
data in section 3 including the relative performance of the proposed method with respect to 
the current method. Both methods utilize a hot deck approach but the proposed method 
redefines the imputation classes using cluster analysis. Some concluding remarks including 
possible generalizations of the method are given in section 4. 


2. CLUSTER ANALYSIS TECHNIQUES: 
A BRIEF REVIEW 


The problem of classifying a given number of entities described by a number of quantitative 
variables into groups such that entities within the same groups or clusters will be similar to 
each other and dissimilar to entities in different groups is considered in this section. A good 
review of clustering techniques is given by Everitt (1980) mainly based on the work of Cormack 
(1971). Most clustering techniques can be classified into two groups, namely ‘hierarchical tech- 
niques’ and ‘disjoint techniques’, the latter one also known as ‘optimization techniques’. These 
two groups of techniques will be described below. Some other methods, are density techniques 
where clusters are formed by searching for regions containing dense concentrations of entities. 
This is based on the fact that if entities are described as points in a metric space, there should 
be parts of the space in which the points are very dense, separated by parts of low density. 
Another class of techniques is called clumping techniques in which the clusters can overlap. 
In certain fields such as language studies, for example, classification must permit an overlap 
between the classes because words tend to have several meanings, and if they are classified by 
their meanings they may belong in several places. 

Hierarchical techniques can be subdivided into ‘fusion techniques’ and ‘divisive techniques’. 
In fusion methods, each entity begins in a cluster by itself. At each step, the two closest clusters 
are fused to form a new cluster until only one cluster containing all the observations is left. 
In divisive techniques, all entities are first grouped into one cluster. Then, at each step, groups 
of the entities are successively broken down into finer partitions until each entity constitutes 
a cluster by itself. Hierarchical techniques differ with respects to the definition of the distance 
measure between observations or groups of observations. An advantage of hierarchical tech- 
niques is that a single run can produce results for one cluster to as many as you like by stop- 
ping the fusion or division process at the desired level of the hierarchy. Obviously, hierarchical 
techniques can be used for only small data sets since there are n(n — 1)/2 possibilities to fuse 
two entities in a group of 7 entities and 2"-! — | possibilities to break a group of n entities 
in two groups. 

In contrast to hierarchical techniques where observations belong to a series of clusters depen- 
ding on the level of the hierarchy, disjoint techniques divide observations into a number of 
clusters (generally predetermined) such that each observation belongs to one and only one 
cluster. They also differ from hierarchical techniques in that they admit relocation of the obser- 
vations so that a poor initial partition can be corrected at a later stage. Disjoint techniques 
are clearly more appropriate than hierarchical techniques to handle large data sets. Disjoint 
techniques are also called optimization techniques because they seek for a partition of the data 
which optimizes some predefined criterion. Various disjoint techniques differ in the way the 
methods obtain an initial partition and in the clustering criterion they try to optimize. Usually, 
disjoint techniques start by selecting a set of points called cluster seeds as a first guess of the 
means of the clusters. A number of procedures have been suggested for choosing these points 
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(Anderberg 1973). Once the cluster seeds have been selected, the entities are then assigned to 
the closest cluster seeds (usually, the Euclidean distance is used). Estimates of the cluster means 
might be updated after each allocation (MacQueen 1967) or after all entities have been allocated 
(Ball and Hall 1967). Once an initial partition has been found (which is equivalent to finding 
a set of cluster seeds and to allocating each entity to the closest cluster seed), a search is made 
for entities whose re-allocation to some other group will improve the clustering criterion. This 
procedure is repeated until no further move of a single entity improves the clustering criterion. 
A local optimum is then reached. This is what Anderberg (1973) calls ‘nearest centroid sorting’. 
In general, there is no way to know whether a global optimum has been reached. 


3. APPLICATION: FORMING IMPUTATION 
CLASSES FOR THE HMS 


3.1 Background 


The Hospital Morbidity System (Statistics Canada 1987) consists of a count of inpatient 
cases, discharged during the year from general and allied special hospitals in Canada except 
Yukon and Northwest Territories. Each record of the system contains at least one diagnosis 
code, the age and sex of the patient, the length of stay, etc. The first valid diagnosis on the 
record is called the tabulating diagnosis and is the diagnosis on which tabulations are based 
in the publications. This diagnosis can be seen as the main cause for which the patient is 
hospitalized and is coded according to the 9th Edition of the International Classification of 
Diseases (World Health Organization 1977) which contains more than 5,000 diagnoses. 

The age imputation problem in the HMS is currently treated by a hot deck method. In this 
imputation problem to predict the age of the patient y, two auxiliary variables are used, namely 
the tabulating diagnosis d which is always present on the record and the sex of the patient s. 
The sex itself needs to be imputed first if it is missing according to the observed male/female 
proportions of d over previous years. Classification of the patients according to d and s forms 
an imputation matrix with the number of imputation classes larger than 5000 x 2. In order 
to reduce the dimension of the imputation matrix, diagnoses were regrouped or collapsed, based 
on the age distribution of each diagnosis. Let F, denote the age distribution in the population 
of the patients with tabulating diagnosis d. Then, diagnoses A and B would be collapsed together 
if F’, is close to Fg. Estimates of F, from available data can be used for this purpose. It should 
be noted that the sex variable was not used in defining imputation classes (see section 4 for 
details on how it could be used) although it was used in the imputation scheme. By not using 
the sex variable for defining imputation classes, the number of imputation classes of the 
imputation matrix is reduced by half. 

In order to motivate the proposed method for collapsing imputation classes, we will first 
describe the current method and its limitations. The collapsed groups were created by com- 
paring manually (using histograms) the shapes of the empirical age frequency distributions, 
F,, of all diagnosis codes corresponding to 1974 HMS data. Thirty six groups were obtained 
and a 37th group was created for those diagnoses for which less than 200 observations were 
available. The number of groups was determined a posteriori arbitrarily. The main deficiency 
of the current method comes from the fact that no statistical criterion was used to group 
diagnoses which makes the method labour intensive and somewhat subjective. These groups 
were obtained by simply comparing histograms. An evaluation of the current imputation 
method indicated that the resulting groups of diagnoses were, in a few cases, not homogeneous 
with respect to F, and consequently needed to be updated. 
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3.2 Proposed Method 


The proposed method can be briefly described as follows. We shall consider the case when 
only one quantitative variable needs to be imputed. Extension to cases where more than one 
variable requires imputation is discussed in section 4. Let’s denote by y the variable to be 
imputed and by F; the distribution of variable y in class i. Note that the classes are defined 
by the cross-classification of one or more auxiliary variables which are suitably categorized 
if necessary. The first step is to find an appropriate set of parameters to represent F; in each 
class, for example, the first three or four moments of the F;’s or the percentiles. The next step 
is to estimate these parameters from the respondent data. Finally, a suitable technique of cluster 
analysis on the set of estimated parameters can be used to condense the number of classes such 
that classes grouped together will be similar with respect to the parameters representing the 
Fs, 

A justification for the choice of the proposed method in the context of the age imputation 
for the Hospital Morbidity System (HMS) will now be presented. First, consider some possible 
alternative strategies to the collapsing problem. One strategy for this problem might be similar 
to the original method that was used for 1974 data, that is, to group diagnoses according to 
the distributions F, but using a statistical criterion for grouping instead of manually com- 
paring histograms. Data would be cross-classified by tabulating diagnoses, sex and a number 
of age groups, say 10. Two diagnoses would be grouped together if the proportion of cases 
in each of these ten age groups, Pp), .--, Pio Were judged to be close to each other according 
to some criterion such as the Euclidean distance or a chi-square measure. Note that the use 
of a chi-square measure would cause serious computational burden since no commonly available 
cluster analysis program uses this distance measure. This would imply the calculation of the 
chi-square distance for all possible pairs of diagnoses. Another possible strategy would be to 
first use data reduction techniques such as principal components to reduce the dimension of 
age groups and then decide whether two diagnoses are close based on principal component 
scores. An obvious disadvantage to all these methods is the number of observations required 
to obtain a reliable estimate of the categorical age distribution for each diagnosis. 

In view of the above problem, we decided to use the first two or three moments to approx- 
imately describe F,. We started with three - the mean mq, the standard deviation sy and the 
skewness coefficient by. However, it was found by means of principal component analysis 
that it was not necessary to include bg. The approach then is to collapse diagnoses according 
to the sample mean, m,, and the sample standard deviation s,. Cluster analysis can be used 
to provide a suitable statistical technique for this purpose. An obvious advantage with this 
approach over other strategies based on the categorical distribution of age is that a reliable 
estimation of two moments requires much fewer observations than the estimation of the pro- 
portion of cases over several age groups. In section 4, implementation of this approach is 
described for the problem of age imputation. 


3.3 Procedure Steps in the Implementation of the Proposed Method for HMS Data 


There are four steps in implementing the proposed collapsing method based on cluster 
analysis for the age imputation problem for HMS data. 


Step I: Selection of a clustering method 


Before selecting a clustering method, it should be noted that our goal is primarily to par- 
tition the diagnoses into homogeneous groups without trying to uncover ‘natural’ or ‘real’ 
clusters. This is called ‘data dissection’ in the literature (Everitt 1980). Another impor- 
tant consideration is the availability of a well tested clustering program using an ef ficient 
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clustering method.The determinant consideration for the selection of a clustering method 
was the number of observations in our data set which resulted in the selection of a dis- 
joint technique rather than a hierarchical technique. 

Taking into consideration the above points, the disjoint clustering technique used in the 
FASTCLUS procedure of SAS (1985) was chosen to do the analysis. This procedure per- 
forms a disjoint cluster analysis based on the usual Euclidean distances computed from 
a given set of quantitative variables. The FASTCLUS procedure combines an effective 
method for finding initial clusters (or initial clusters can be given by the user) with a stan- 
dard iterative algorithm for minimizing the sum of squared distances from the cluster 
means. FASTCLUS was directly inspired by Hartigan’s leader algorithm (1975) and 
MacQueen’s k-means algorithm (1967). A set of cluster seeds is first selected as a guess 
of the means of the clusters. Each observation is assigned to the nearest cluster seed to 
form temporary clusters. The cluster seeds are replaced by the means of the temporary 
clusters each time an observation is assigned (this is an option chosen for our applica- 
tion). After each pass through the data set, the observations are assigned to the nearest 
cluster seed until the changes in the cluster seeds become small or null (chosen to be null 
for our application). The final clusters are formed by assigning each observation to the 
nearest cluster seed. 


Step II: Estimation of parameters 


Two years of HMS data from 82-83 and 83-84 fiscal years were gathered to get estimates 
mg, and sq for each diagnosis d. These estimates were the usual weighted estimates over 
the two year period. Each diagnosis is represented by two variables, mz and sz. The 
problem is now reduced to finding an appropriate partition of the diagnoses according 
to m,and s,. Three special groups of diagnoses judged as outliers were removed. These 
three special groups will form the first three rows of the imputation matrix (the columns 
are defined by the sex variable). A catch-all category was created in the last row of the 
imputation matrix for those diagnoses with, say, fewer than ten observations available 
over the two years of data and not included in the three special groups. The choice for 
the upper bound of ten observations was made arbitrarily. Cluster analysis can then be 
used to group the remaining diagnoses not included in the three special groups with at 
least ten observations available. 


Step III: Determination of the number of clusters 


The determination of the number of clusters was dictated by operational constraints since 
the imputation module of the program doing the imputation will accept a maximum 
number of rows not larger than 40. Since there are already three rows for special diagnoses 
and one row for diagnoses with fewer than ten observations, the maximum number of 
other rows that would not affect the program is then 36. A small empirical study 
calculating the R coefficient for different numbers of clusters indicated that the R* coef- 
ficient was already above 98% for 36 clusters, suggesting that 36 clusters was acceptable. 
Note that even with 15 clusters, the R? could be made as high as 95%. The definition of 
the R* coefficient is given in section 3.4. 


Step IV: FASTCLUS implementation 


First, an initial partition of the observations into 36 groups was chosen (equivalent to 
choosing a set of 36 cluster seeds). Better results were obtained by selecting an initial set 
of cluster seeds than by letting FASTCLUS find initial cluster seeds. Note that different 
initial cluster seeds and different orders of the input data set will yield different results 
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due to the fact that the method produces only locally optimal partitions. To select cluster 
seeds, diagnoses were divided into nine groups of roughly the same size according to mg 
and four groups of roughly the same size according to sy. This procedure produced 36 
homogeneous groups of diagnoses of approximately the same size. The means of the two 
variables m, and sy in each group were taken as initial cluster seeds. Several other varia- 
tions were tried and the procedure giving the largest R* was chosen. 

Second, since m, and sq were based on very different numbers of observations for dif- 
ferent diagnoses, it was judged preferable to perform a weighted cluster analysis, the 
weights being the number of observations available for each diagnosis. Note that, in this 
case, FASTCLUS would minimize the weighted within cluster sum of squares instead of 
an unweighted within-cluster sum of squares. 


3.4 Relative Performance of the Proposed Method 


One way to compare the current and proposed method for collapsing imputation classes is to 
use the R? coefficient pooled over all variables (in our case, it would be the mean and the stan- 
dard deviation). The pooled R? coefficient is the proportion of the total variance explained 
by the between cluster pooled sum of squares (which should be as large as possible). Each pooled 
sum of squares is defined as (SSQ,, + SSQ,)/2 where SSQ,, and. SSQ, are the sums of 
squares of the mean and the standard deviation respectively. The R? coefficients obtained 
from FASTCLUS were 0.993 for my and 0.929 for sq for a pooled R? value of 0.986. The cur- 
rent classification of diagnoses into groups would yield an R? of 0.735 for mq and 0.466 for 
s; producing a pooled R? value of 0.705. Thus, in terms of R’, results indicated that the 
groups of diagnoses formed using cluster analysis were much more homogeneous with respect 
to the variable being imputed than in the case where classes were formed by the earlier method. 


4. CONCLUDING REMARKS 


A methodology based on cluster analysis for collapsing the imputation classes of an imputa- 
tion matrix defined by the cross-classification of several auxiliary variables was proposed. This 
methodology was applied to the imputation of age for the Hospital Morbidity System where 
diagnosis and sex were used as auxiliary variables. 

It should be noted that in this specific application, only one variable, namely the diagnosis, 
was used to collapse the original imputation classes. The variable sex is, however, used later 
in the imputation scheme so that a recipient will be matched to a donor of the same sex. In 
a generalization of the proposed method, one may consider using the two variables, sex and 
diagnosis, in the collapsing process. For this purpose one might also impose some constraints 
that male and female cases of the same diagnosis belong to the same row in the final imputa- 
tion matrix. Alternatively, one could produce two final imputation matrices, one for each sex. 
In either one of these alternatives, the number of initial imputation classes would clearly be 
much higher and hence the collapsing problem more complex. In this situation, it is more likely 
for many classes to have a small number of donors and therefore many of the imputation classes 
would have to be assigned to the catch all category. This, however, may not be desirable in 
practice. This problem can be simplified if one could make the assumption that, for most 
diagnoses, the male and female age distributions are similar to each other. There is some 
evidence based on significance tests that this is not an unreasonable assumption. In the HMS 
example considered, it was decided to group diagnoses based on estimates of vg and og from 
the data pooled over sex. 
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It should also be noted that the choice of mean and standard deviation of age distribution 
to assign numerical scores to each imputation class was not investigated. Other choices might 
be percentiles or some other parameters of the age distribution. Clearly, the results of using 
cluster analysis for collapsing purpose would depend on the choice of the above scores. 

Finally, generalization of the proposed method to the case where k => 1 variables need to 
be imputed and where p = 2 auxiliary variables are available follows in a straightforward 
manner from the simpler case considered in this paper. 
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An Example of the Use of Randomization Tests 
in Testing the Census Questionnaire 


YVES BELAND and ALAIN THEBERGE 


ABSTRACT 


Modular Test 2 was a survey conducted by Statistics Canada that used two different questionnaires. 
Its purpose was to assist in the making of the 1991 census questionnaire. The sample used for the survey 
was not a probability sample. This article briefly describes the survey methodology, and the use of ran- 
domization tests to compare the two questionnaires. 


KEY WORDS: Randomization tests; Non-probability sample; Experimental design. 


1. INTRODUCTION 


Statistical tests could be classified into two groups, randomization tests and classical tests. 
A classical test, is based on a comparison of the observed value of a statistic with the distribu- 
tion, under the null hypothesis, of the values of this statistic for the set of samples that could 
have been selected. To conduct this kind of test, the probability of selecting any given sample 
must be known; therefore probability sampling using a known design is required. A randomiza- 
tion test is based on a comparison of the observed value of a statistic with the distribution, 
under the null hypothesis, of the values of this statistic for all possible permutations of the 
data. This was the method used by Fisher to compare two seed samples (1935), and Edgington 
(1987) also discusses various aspects of this method. ‘‘Treatments’’ are required to define the 
permutations in a randomization test, and the probability of obtaining a given permutation 
must also be known. Which unit will be given which treatment must be decided randomly; that 
is, the experimental design must incorporate randomization. 

In an organization like Statistics Canada, classical tests are generally used because most of 
the sample surveys done by Statistics Canada use probability sampling, and also because there 
are no treatments in these surveys. This article describes how randomization tests were used 
in a survey that was an exception to the rule. 

In Section 2, the methodology used in the modular tests is described briefly. Section 3 
describes using simple examples the procedure used in a randomization test. Section 4 describes 
how randomization tests were applied to Modular Test 2. 


2. MODULAR TESTS 


As part of the planning for the 1991 census, two modular tests were carried out to test ques- 
tions likely to be asked in the census. The purpose of these surveys was to ensure that each 
question whether new or just reformulated was easy to understand. We refer to the tests as 
*“‘modular’’ because they were independent surveys that tested different sections of the census 
questionnaire. 


! Yves Béland, Social Survey Methods Division; Alain Théberge, Business Survey Methods Division, Statistics Canada, 
Tunney’s Pasture, Ottawa, Ontario, Canada, K1A 0T6 
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Modular Test 1 was carried out in November, 1987 in order to revise newly-formulated ques- 
tions dealing with population coverage, marital status, fertility, volunteer work, and nuptiality. 
This first survey used neither classical nor randomization tests. 

Modular Test 2, carried out in January, 1988, was designed principally to measure the reaction 
of ethnic groups to questions on language, ethnic origin, religion, citizenship, and mobility. 
In Modular Test 2, a two-stage sampling plan was used to select about 3,500 households taken 
from within the metropolitan areas of Halifax, Québec, Montréal, Toronto, Winnipeg, and 
Vancouver. To reduce costs and to make data collection easier, and to get a sample that contained 
people of diverse ethnic origins, a non-probability method was used to select the sample. The 
questionnaire used in Modular Test 2 came in two versions. The differences are described in 
Section 4. The households in the sample were given either version | or version 2 on a random basis. 

Randomization tests were used to allow us to statistically test hypotheses pertaining to 
Modular Test 2. Randomizations tests can be used to compare two treatments applied to units 
in samples which may not be probability samples. 


3. RANDOMIZATION TESTS 


The procedure for doing a randomization test will now be described. First, the value of a 
statistic is calculated for the observed data. Next, the value of the same statistic is calculated 
for the other permutations of the data that are possible with the experimental design used. Hp 
is rejected if the value of the statistic for the observed data is extreme in relation to the values 
obtained under Hp for the set of permutations. 

For example, suppose there are four households. Household 1 has three persons, households 
2 and 3 have two, and household 4 has one. These households may have been chosen arbitrarily, 
but a household whose members will receive treatment Y is chosen at random. Members of 
the three other households will receive treatment X. Suppose that household 4 is selected for 
treatment Y. For household 1, the treatment succeeds for two of the three members, for 
households 2 and 3, for one of two members, and for household 4, it fails for the sole member. 
Our null hypothesis states that the results are independent of the treatment used. To measure 
the impact of treatment X compared to treatment Y, the statistic S, giving the average number 
of successes for treatment X minus the average number of successes for treatment Y is calcu- 
lated. Here S = (2 + 1 + 1)/(3 + 2 + 2) — 0/1 = 4/7. To find out whether this value 
is significant, the values for S obtained by permuting the observations are given in Table 1. 
Each observation in Table 1 shows the number of members in the household after the vertical 
bar, and the number of successes before the vertical bar. If a right-tailed test is used, Ho is 
rejected when a = 3/12 = .25, because three of the twelve permutations yield an S value 
greater than or equal to 4/7, the observed value. 

Rather than permuting the observations, we could have permuted the treatments. Table 2 
gives the results when this is done. Because only one of the four permutations yields a value 
for S greater than or equal to 4/7 for a right- tailed test, we again reject Hy ifa = 1/4 = eee 
It is not a coincidence if the results are the same. Note n,;, the number of units that receive 
treatment k (k = 1, ..., K) and for which the result 7; (i = 1, ..., 7) is observed; 
n, = Yj; Ny the number of units that receive treatment k,n; = Vx xi, the number of units 
for which the result 7; is observed; and n. = Y,x Y; mi, the total number of units. The 
number, N,, of permutations of the treatments is given by 


N, = nt] I] (%): (1) 


k 
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Table 1 
Values of the Statistics S for each Permutation of the Observations 


——:.. eee 


Treatment Permutations 


cuneate sheet bc eppror stems te nbn lenge cee Gate gh nie cet Dyed) SS oti. 
ORS |S PUF LP LIM ger 2%3 Sta sseaysa Cpa So | MAST ipo) ais Foi 1 
Rese MADE? [3 MU SaNOpothay PICe DTS Oy Te tT Pye ae he Oe HOP Temas 
CARA QING 1 HDHD IPBi9H03.1'09 b |e2zteO)|AoitAsfidyor2 3 defi nt (Od dd) tob)-2eosi1 | 2 
HA ALON HSE Avg Hvis 1k] D9 Ltr NHRD Yh 15/12 ef 2s KG 124134 SANS 
Ss 


4/7 4/7 4/7 0 0 0 0 0 O -4/15 —-4/15 —47/15 
a EE rr ce ee SL es Be cre et A) eg 


Table 2 
Values of the Statistics S for each Permutation of the Treatments 
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Observation Permutations 
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Of these N, permutations, there are N¥ for which n,; units are associated with treatment k and 
UeaesUtty (ki Le 2s-0.,, Ket = 1, 2) ee) where 


nt =J] (i | Il (na) Q) 
i k 


In addition, there are N, permutations of the observations where 


=n! / [I iv. 3) 


Of these N, permutations, there are N* for which n;; units are associated with treatment k and 
Plemestir AK — 1, 2,,.6,°K: = 1.2... 41), Where 


Ns = [[ (" TT (nu) (4) 
k i 


Because N3/N, = N#/N,, the tests are equivalent. To reduce the number of calculations, it 
is preferable to permute the treatments if N, < N,, and to permute the observations if 
N,; > No. Dwass (1957) suggests that when there are a large number of permutations, a sample 
of permutations can be taken, and the observed value of the statistic can be compared to the 
set of values for the sample. If all of the permutations are not considered, the level of the test 
is not affected, only its power is. 
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If the permutations are sampled, the rule given above can still be applied, not to reduce the 
number of calculations, but to minimize the loss of power due to sampling. For example, Dwass 
shows that for a one-tailed test at the 0.05 level, the loss of power for a sample of 999 per- 
mutations is no more than 5.5%. Bradley (1968) notes that when the power of randomization 
and classical tests are compared, the results depend on to what extent the requirements of the 
classical tests have been met. 

Because of the way in which randomization tests are constructed, the inference applies only 
to the effect of treatment on units in the sample, and not to the entire population. Classical 
tests, however, are based on a random sample drawn from a population that rarely matches 
the population of interest. In the present case for example, the population of interest is the 
Canadian population on Census Day, June 4, 1991. So for both types of tests, non- statistical 
arguments must be used to generalize inferences to the population of interest. 


4. THE USE OF RANDOMIZATION TESTS IN MODULAR TEST 2 


As mentioned above, there are two questionnaire versions for Modular Test 2, versions X 
and Y. Questions on ethnic identity and ethnic origin differ in the two versions. ‘ ‘CANADIAN”’ 
is a response category in version X that the respondent can select to answer the questions on 
ethnic identity and origin. In version Y, those who want to respond ‘“CANADIAN”’ must write 
it out in full after selecting the category, ‘“OTHER.”’ 

We wanted to know whether questions on ethnic identity and origin in version X of the test 
questionnaire got more or got less multiple responses than these questions in version Y. By 
a multiple response we mean any response in which more than one category has been chosen. 
We also wanted to find out what bearing the type of questionnaire had on multiplicity (number 
of response categories selected by the respondent), and on the selection of certain response 
categories (such as ‘‘FRENCH’’) for these questions. The types of questionnaire constitute 
the treatments. Because the sample for each region had its peculiarities, the randomization 
tests were done separately for each of the metropolitan areas from which the sample was 
taken. 

First of all, we generated at random a sample of 999 permutations of the questionnaire ver- 
sions. A permutation is generated as follows: For any given region, let N, and N, represent 
the number of X and Y questionnaires respectively. Using Bebbington’s algorithm (1975), from 
the N, + N, households take a simple random sample of N, households. Household members 
in this sample are then assigned version X of the questionnaire. This process is repeated 999 
times. Next, calculate for a given question the proportion of respondents who gave a multiple 
response for version X and for version Y. These proportions are denoted P, and Py. 

Next, for each of the 999 permutations of the questionnaire versions, as well as for the initial 
observed sample, we calculated the statistic S = P, — Py. In this way we obtained 1,000 
values for S, which we ranked in increasing order. If more than one statistic had the same value, 
we generated a random number between 0 and 1 and used it to determine the order of statistics 
of the same value. We used the variable RANKP,_, to represent the rank of an observed S 
statistic. 

Let p, and p, represent the expected proportion of respondents who gave a multiple 
response for version X and version Y respectively. For all regions excluding Halifax we tested: 


Ao: bx = by 
versus 
Ay? py > py- 
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For Halifax, the counter-hypothesis H: u, < , was used because more multiple responses 
were expected for version Y of the questionnaire. Because ‘“CANADIAN”’ was not an available 
response category on version Y of the questionnaire and because the majority of households 
selected in this region were made up of people of British origin (that is, English, Scottish, 
or Irish), members of households that received version Y marked one or more of these cate- 
gories. Members of households that received version X had the option of marking only the 
““CANADIAN’’ category. 

The critical level, &, is calculated as follows: for the Halifax region, given that Hp is rejected 
if the proportion of respondents who gave a multiple response in version X is significantly lower 
than the proportion observed for Y, the critical level is RANKP,,._ y /1000. For all the other 
regions, given that Hp is rejected if the proportion for_X is significantly higher than the pro- 
portion observed for Y, the critical level is (1001 - RANKP,._ y)/1000. The results are shown 
in Table 3. 

Randomization tests were also used to test multiplicity (the number of response categories 
selected by the respondent) for questions on ethnic identity and origin in each of the regions, 
but this time ratios (R,, R,) are used, instead of proportions (P,, P,). Ratio R, is the average 
number of response categories selected by respondent for a question in version X of the ques- 
tionnaire, and ratio R, is the average number of response categories selected in version Y. The 
rest of the method is the same except that instead of RANKP,,._ y» RANKR,_, is used, and the 
statistic S is defined as R, — R,. However, because there is greater variability for the values 
of the statistic S in the tests for multiplicity, a sample of 1,999 permutations was generated 
instead of 999. 

Let Fand G represent the distribution functions of the number of response categories selected 
in version X and version Y respectively. For all the regions excluding Halifax, we test the 
hypothesis 


Ao: F =nG. 
versus 
A: F(z) Ss G(z) for all z and F # G. 


If Hp is rejected, the number of response categories selected for an _X questionnaire is said to 
be stochastically larger than the number of response categories selected for a Y questionnaire. 
For Halifax, the counter-hypothesis used is H,: F(z) = G(z), for all z and F # G. The 
results are shown in Table 3. In the Québec region, the value of R, is less than 1 for each ques- 
tion. This is because most respondents in this region chose only one response category, and 
some respondents did not answer one or other of the questions. 

Finally, versions X and Y for Modular Test 2 were compared for some regions as to the 
number of respondents who identified themselves as being of French, Italian, or British origin. 
By “‘BRITISH’’, we mean that at least one of the categories ‘‘IRISH,’’ “‘“SCOTTISH,”’ or 
““ENGLISH”’ was chosen. For example, if a test was done on the proportion of people selec- 
ting ‘““FRENCH”’, »., and p, were defined as the expected proportion of questionnaires where 
the response ‘‘FRENCH”’ would be chosen in versions X and Y of the questionnaire. In all 
regions, we tested 


Ho: py = by 
versus 


A: Lx < by- 


The randomization tests were done using 999 permutations. The results are shown in Table 4. 
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T 
Critical Levels for the Rate of can lee Responses and Multiplicity 
Multiple Response Multiplicity 

Question Region Py Py a Ry Ry a 

ORIGIN HALIFAX 0.435 0.536 0.087 1.617 1.914 0.062 
ORIGIN QUEBEC 0.154 0.043 0.001 1.143 0.986 0.001 
ORIGIN MONTREAL 0.185 0.194 0.612 1.141 ioe 0.585 
ORIGIN TORONTO 0.127 0.122 0.393 1.124 1.125 0.495 
ORIGIN WINNIPEG 0.293 0.307 0.622 1.439 1.398 0.345 
ORIGIN VANCOUVER 0.285 0.296 0.621 1.440 1,392 0.280 
IDENTITY HALIFAX 0.220 0.335 0.035 1.244 1.502 0.029 
IDENTITY QUEBEC 0.140 0.016 0.001 1.131 0.959 0.001 
IDENTITY MONTREAL 0.159 0.125 0.063 1.075 1.044 0.186 
IDENTITY TORONTO 0.186 0.120 0.001 1.154 1.075 0.005 
IDENTITY WINNIPEG 0.224 0.195 0.248 1.253 1.208 0.298 
IDENTITY VANCOUVER 0.186 0.183 0.457 1.182 1.137 0.202 

Table 4 
Critical Levels for Selected Variables 

Question Variable Region lee Py & 

ORIGIN FRENCH QUEBEC 0.127 0.897 0.001 
ORIGIN FRENCH MONTREAL 0.038 0.210 0.001 
ORIGIN BRITISH HALIFAX 0.321 0.837 0.001 
ORIGIN BRITISH MONTREAL 0.034 0.092 0.002 
ORIGIN BRITISH TORONTO 0.085 0.135 0.003 
ORIGIN BRITISH WINNIPEG 0.167 0.234 0.054 
ORIGIN BRITISH VANCOUVER 0.267 0.325 0.065 
IDENTITY FRENCH QUEBEC 0.138 0.899 0.001 
IDENTITY BRITISH HALIFAX 0.153 0.828 0.001 
IDENTITY BRITISH MONTREAL 0.022 0.117 0.001 
IDENTITY BRITISH TORONTO 0.050 0.215 0.001 
IDENTITY BRITISH WINNIPEG 0.074 0.276 0.001 
IDENTITY BRITISH VANCOUVER 0.104 0.325 0.001 
IDENTITY ITALIAN TORONTO 0.412 0.463 0.060 
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5. CONCLUSION 


The results for tests on the rate of multiple responses are similar to those on multiplicity, 
which is not surprising. When you compare the critical levels for the question on ethnic origin 
to the critical levels for the question on ethnic identity, it is seen that the differences between 
the two versions of the questionnaire affect the responses to the question on ethnic identity 
the most. 

Our main reason for using randomization tests was that the sample for Modular Test 2 was 
a non-probability sample. However, there are also other cases where randomization tests are 
appropriate. For example, to do a ‘‘Student’s’’ ¢ test for means equality the hypothesis of nor- 
mality is required, and it must also be assumed that the variances are equal. These assump- 
tions are not needed for a randomization test. It should be kept in mind that the results of a 
randomization test apply to the sample, and not necessarily to the entire population, unless 
a simple random sample is used. 
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Variance Formulae for Composite Estimators 
in Rotation Designs 


PATRICK J. CANTWELL! 


ABSTRACT 


In many government surveys, respondents are interviewed a set number of times during the life of the 
survey, a practice referred to as a rotation design or repeated sampling. Often composite estimation - 
where data from the current and earlier periods of time are combined - is used to measure the level of 
a characteristic of interest. As other authors have observed, composite estimation can be used in a rota- 
tion design to decrease the variance of estimators of change in level. In this paper, simple expressions 
are derived for the variance of a general class of composite estimators for level, change in level, and average 
level over time. Considered first are ‘‘one-level’’ rotation designs, where only the current month is 
referenced in the interview. Results are developed for any sampling pattern of m interviews over a period 
of M months. Subsequently, ‘‘multi-level’’ plans are addressed. In each month one of p different groups 
is interviewed. Respondents then answer questions referring to the previous p months. Results from the 
several sections apply to a wide range of government surveys. 


KEY WORDS: Repeated sampling in surveys; Balanced designs; Month-to-month change; Yearly 
average. 


1. INTRODUCTION 


Rotation designs of various types are used in many major household surveys. The Current 
Population Survey (CPS) is conducted by the U.S. Bureau of the Census for the U.S. Bureau 
of Labor Statistics. Statistics Canada operates the Labour Force Survey (LFS). Both surveys 
yield estimates of labor force characteristics, including unemployment. In each survey, 
households are interviewed a number of times before leaving the sample. In the CPS, each 
household is ‘‘rotated in’’ for interviews in four consecutive months, rotated out of the sample 
for eight months, and finally back in for four more months. In the LFS, a participating 
household responds for six consecutive months and does not return. 

A survey with a rotation design lies somewhere between a fixed panel survey, where par- 
ticipants remain in sample indefinitely, and a survey using independent samples, where 
respondents are interviewed once and retired from sample. The total overlap of a fixed panel 
from one time period to the next can minimize the variance of estimators of change when 
measurements are positively correlated across periods. Also, certain costs are incurred only 
the first time a unit is placed in sample. However, response burden on the members of a fixed 
panel can be excessive. Using a rotation design is an attempt to realize variance or cost reduc- 
tions without overly burdening sample participants. In the CPS and the LFS, there are sample 
overlaps of 75% and 83%, respectively, from one month to the next. For more on these topics, 
see Woodruff (1963), Rao and Graham (1964), or Wolter (1979). 

Some estimators used with rotation designs are composite in nature. In order to take advan- 
tage of repeated sampling, they combine rotation group estimates obtained for the current 
month with those from prior months into a final estimator. 


! Patrick J. Cantwell, Mathematical Statistician, Statistical Research Division, Bureau of the Census, Washington, 
DC 20233, USA. This paper reports research undertaken by a member of the Census Bureau’s staff. The views 
expressed are attributable to the author and do not necessarily reflect those of the Census Bureau. 
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While the variance of composite estimators can be decreased by selecting the combination 
wisely, calculating this variance may become more complex because of the correlation pat- 
terns involved among the repeated groups. For general rotation plans, subject to specific restric- 
tions, simple formulae are presented in this paper for the variance of estimators of level and 
change. The derivations are applied to an important and quite general class of estimators called 
the generalized composite estimator (Breau and Ernst 1983). 

These formulae can be of use if the correlations between estimates from the same rotation 
group one or more time periods apart can be estimated and are suffiently large to render com- 
posite estimation worthwhile. In continuing government surveys, past sample data will typically 
enable the estimation of these correlations. Characteristics involving household income and 
labor force usually exhibit moderately high correlations. For others, such as the incidence of 
crime, however, correlations across time periods may not be large enough to realize the benefits 
of composite estimation. Of the surveys mentioned in this paper, only CPS currently uses a 
composite estimator. 

In the developments which follow, two types of surveys are treated separately. In surveys 
such as CPS and LFS, participants supply information only for the current month. Such surveys 
are called ‘‘one-level’’ surveys. On the other hand, the U.S. Census Bureau conducts the Survey 
of Income and Program Participation (SIPP) to acquire data on income level, sources of income, 
program participation, and other items. During each interview, respondents in the SIPP refer 
back to the previous four months. A different group is then interviewed the following month. 
The SIPP design is consequently called ‘‘multi-level.’’ The level of a survey was used by Wolter 
(1979) to indicate the number of periods for which information is solicited in one interview. 

Another distinction is made between these two types of surveys. Let the term * ‘design gap”’ 
indicate a period of time between interviews which is never referenced in any interview. While 
the LFS contains no design gaps, CPS includes one of eight months. For the sections pertaining 
to one-level designs, the results and derivations apply regardless of the pattern of interviews 
and design gaps. Therefore, the formulae are relevant not only to the current design of CPS 
and LFS, but also to other designs under consideration. 

For reasons discussed later, designs gaps are generally not a feature of multi-level rotation 
plans in practice. The SIPP is no exception. Accordingly, the multi-level plans addressed in 
this paper do not include design gaps. 

One-level designs are treated in Sections 2 and 3. In Section 2, the generalized composite 
estimator is defined. Notation, definitions and covariance assumptions are introduced. The 
main results - Theorems 1 through 3 - are given in Section 3. Variances of estimators of level 
and change in level are stated. The formulae are determined for single time periods (such as 
months) and combinations (such as quarters or years). They apply to one-level designs with any 
pattern of interviews and design gaps. When seeking the optimal rotation plan and composite 
estimator, the user must determine how best to combine variance reductions/increases for the 
resulting estimators of level, ‘‘month-to-month’’ change, and average over many periods. 

In Section 4, these results are extended from one-level to certain multi-level designs, which 
include the SIPP. Subject to minor restrictions - in particular, the exclusion of design gaps 
in the sampling scheme - theorems similar to those in Section 3 are stated. Because the deriva- 
tions are analogous to those for one-level plans, the results are not proved. 


2. ONE-LEVEL DESIGNS: NOTATION AND DEFINITIONS 


Although rotation schemes can assume infinitely many forms, the discussion in Sections 
2 and 3 is restricted to one type. At each period of time, a new rotation group enters the sample, 
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and follows the same pattern of interviews and design gaps as every preceding group. In addi- 
tion, responses refer only to the current period of time, whether or not the participants were 
in sample in the previous period. This design is called a balanced one-level rotation plan. The 
design is “‘balanced’’ because the number of groups in sample at any time is equal to the total 
number of time periods any one group is included in the sample. 

The scheme used in the LFS satisfies these restrictions. Each month a new group enters, 
and remains in the sample for five more months. The CPS as it currently operates follows these 
guidelines in a 4-8-4 plan. Before July 1953, however, CPS used an unbalanced design where 
five rotation groups entered, one each in consecutive months. In the sixth month, no new group 
entered. The process then continued in the same manner, with groups exiting after six months 
in sample. 

One problem with the CPS design before 1953 is the introduction of month-in-sample bias, 
often referred to as rotation group bias. Of greater concern here is the changing pattern of 
rotation group appearances. The variance of a composite estimate depends on when each par- 
ticipating group appeared in sample before, and the covariance structure for identical groups 
in different months. If the pattern of appearances changes from month to month, the variance 
formula of the estimator also changes. Under a balanced design with stationary covariance 
structure, general derivations are possible. 

Throughout this paper, the word ‘‘month”’ refers to the period of time in which interviews 
are done, partly for brevity, but also because most government surveys use the month to divide 
the life of the survey. However, the results in this section and the next apply to any period of 
time, provided the rotation plan is balanced and one-level. 

Some notation and vector definitions are now introduced. Suppose that every rotation group 
is in sample for a total of m interviews over a period of M months. That is, it is out of sample 
for M — m months after first entering and before exiting. The balanced design ensures that 
m groups are in sample during any month. 

The set 7p is defined as follows. Consider any rotation group. Let Tp index the set of 
“‘months’’ when this group is not in sample, labeling as month one the month this group is first 
interviewed, and stopping at month M. Because the design is balanced, the composition of To 
does not depend on which group is selected. Note that, if respondents are interviewed in m con- 
secutive months, i.e., there are no design gaps, then m and M are the same, and Ty is empty. 

Next, given a set of m values w,, ..., W,,, it is possible to define the M x 1 vector was 
follows. Define the ith component of w to be O if i € To. This step fills M — m positions in w. 
Then the values w,, ..., w,, are inserted in order into the remaining m components, starting 
with the first. The resulting w is called a vector ‘‘in design form.’’ For example, in a 4-8-4 rota- 
tion plan, Ty) = (5,6, ...,12}, and w7 = (w,, wp, W3, Wa, 0, 0, 0, 0, 0, 0, 0, 0, Ws, We, W7, 
We ) 

It is useful to introduce the M x M matrix R as: Rj; = 1 if i ¢ Tp, and O if i € To; and 
Ry = Oifi # j. It is clear that R is a diagonal matrix where diag(R) is a set of 1’s ‘‘in design 
form,’’ Rj; and Ryy are 1, and Y “@,R;; = m. 

Observe that, for any M x p matrix V, RV is the same as V, but with 0’s across each row 
isuch that 7 is in Tp. In other words, premultiplication by R ‘‘removes’’ (turns to 0) the rows 
of V indexed by 7). If the columns of V are already in design form, then RV = V. Similarly, 
for any p X M matrix U, postmultiplication by R ‘‘removes”’ the columns of U which are 
indexed by 7p. If the rows of U are already in design form, then UR = U. 

Let L be the M X M matrix with 1’s on the subdiagonal, and 0’s elsewhere. Formally, 
Lj = 1,ifi — j = 1, and 0, otherwise. For any M x 1 vector written as w? = (w, ..., 
wm), the product Lw becomes (0, w,, W2, ..., Wa—1)7, and w/L is (w, W3, ..., Way, 0). 
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Turning to the data, let x; ; denote the estimate of ‘‘monthly’’ level for some characteristic 
to be measured from the rotation group which is in sample for the ith time in month h, where 
i = 1, ..., m. Breau and Ernst (1983) defined the generalized composite estimator (GCE) of 
level recursively as follows. For monthly level, let: 


m m 
Yn = sy anxgacrak > DiXn—1,1 + KYn-1> (1) 
= 


i=1 


where 0 < k < 1, and the a,’s and b;’s may take any values, including negative ones, sub- 
jectto ¥ %,a; = land 7219; = 1. The ‘“‘current composite’ and AK composite estimators 
used in CPS are special cases of the GCE. For information on these, see Hanson (1978), Huang 
and Ernst (1981), and Kumar and Lee (1983). 

The GCE is more restrictive than a general linear estimator which combines x;, ; values from 
the current period with those from many prior months (see Gurney and Daly 1965). However, 
the GCE has been shown to perform almost as well (Breau and Ernst 1983). It has the advan- 
tage that only data from two months - the current month and the preceding one - need to be 
stored. Although y, incorporates earlier data, it is summarized through y,_1. 

To facilitate variance computations, (1) is expressed in vector form. LetaandbbeM x 1 
vectors in design form comprising, respectively, the sets of constants a,, ..., @» and 
b,, ..-, Dm. Similarly, for any h, the observations Xp,1, ---» Xn,m Make up Xp, alsoanM x 1 
vector in design form. Then 


Vn = a’x, — kb? x, FP kYpn-1- (la) 


The data are assumed to exhibit a stationary covariance structure: 
(i) Var(x;,;) = 97 for all h and i; 


(ii) Cov(Xp;.Xn,j) = Ofori # j, i.e., different rotation groups in the 
same month are uncorrelated; and 


(iii) Cov (Xp;,%s,j) = Pin—s|\O> if the two x’s refer to the same rotation 
group | h — s | months apart; or 0, otherwise. Take po to be 1. (2) 


From the first two parts of (2), it is clear that Var(x,) = o°R, for all h. Part three implies 
that Cov(x,;,X,-1) = o”p,RLR. This follows because (a) the matrix L, with 1’s on the sub- 
diagonal, ‘‘represents’’ the one month lag between the x, and x,_, values, and (b) pre- 
multiplying (postmultiplying) by R inserts 0’s corresponding to 0’s in x, (X,—1) (months not 
in sample). 

It is readily seen that (L’), = 1ifi -j =r = Oand1 <j,i = M; take L° to be the 
identity matrix. The same development as above gives Cov(x,,X,-2) = o7p,RL?R. In 
general, 


Cov (x_,X;_r) = 07p,RL'R, for r = 0, 1,2, ..., andall A. (3) 


For r => M, L’ = 0, and Cov(x;,,x,_,) = 9. 


For the theorems which follow, define the M x M matrix Q by: Qj = ki/p;,_;, if 
1 < j <i <M, and 0, otherwise. Finally, let J be the M x M identity matrix. 
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3. ONE-LEVEL DESIGNS: THEOREMS AND PROOFS 


Three theorems are now stated and proved. 
Theorem 1. If the GCE of level is defined as in (1), and the covariance structure as expressed 
in (2) holds, then 
Var(y,) = o*{a’a + k*b"(b — 2a) + 2(a — k*b)7O(a — b)}/(1 — Fk’). (4) 


Notice that when one uses an unweighted average of the estimates from the m rotation groups 
of the current month, k = 0, Q = 0, anda; = 1/m, fori = 1, ..., m. Then Var(y,) = 
o”/m, as expected. 


Proof of theorem 1. Substitution into (1a) recursively leads to 


Yn = ax, + (a — 5)T % k'xp_j. (5) 


i=1 


From (3), the variance of this sum is 


Var(y,) = a’o*Ra + (a — b)? » k*o*R(a — b) 


i=1 


+ 


2a? )) k'o*p)RL'R(a — b) 
ra 


2 (dD) Na aa ak bap Riis Rig. B) 


i=1 j=it+l 


bh 


of a7Re + (a — b)™R(a — b)k*/(1 — k?) 


+. 


2aR ( \3 K'osh') Ra — b) 


i=1 


+e 


2(a — TR ( 3 | Dy ky tt) ) Ra _ »)| ’ (6) 
i=l j 


jJ=it+l 


Because a and a — bare vectors in design form, a’R = a’, (a — b)'R = (a — b)’, and 
R(a — b) = (a — b). Thesum ¥ 2%,k’'p,L' is seen to be the matrix Q: its ijth entry is k’/ 
p;-;,1f 1 < j < i < M, and 0, otherwise. A change of variables will show that the sum in 
brackets is also Q. Expression (6) can be rewritten as: 
o*{a’a + (a — b)7"(a — b)k?/(1 — k*) + 2a7Q(a — b) 
+ 2(a — b)’O(a — b)k*(1 — k’)}. 


Simple rearrangement of these terms produces the result in (4). 
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Theorem 2. Let y, — y,_; be the GCE estimator of ‘‘month-to-month’’ change. Then 
Var(¥; — Yn-1) is 


(i) 207a7(I — p,L)a, if k=0, and 
(ii) o?(a7a + k*b"b — 2kp,a™Lb)/k — (1 — k)*Var(y,)/k, if O< k <1. 


Proof of theorem 2: 


(i) If kK = 0, y, = a’x,. From (3), the variance of a7x, — a. 


Tres 
2a’o*Ra — 2a'0*p,RLRa = 207a' (I — p,L)a. 
(ii) If0 < k < 1, define W,, as a’x, — kb*x,_,. From prior results, it is quickly seen that 
Var(W,,) = o*{a’a + k*b"b — 2Kp,a™Lb}. (7) 
From (la), y, = W, + ky,_,. Then 
Var(y,) = Var(W,) + k?Var(yp__1) + 2kCov( Wy Yp_1)3 (8) 


the covariance term can be isolated for later use. Finally, y, — y,_) = W, — (1 — k)yy,-1. 
When computing the variance of this difference, substitution from (8) and (7) produces the 
desired result. 

Often of primary importance are the average level over a certain length of time (e.g., a quarter 
or a year), the difference in these averages from one ‘‘year’’ to the next, or the difference in 
“‘monthly’’ level for two months a year apart. Denote by S;,, the sum of the GCE’s for the 
last t months: 


She Ve + Vaai t .c- Hee, fF SL, (9) 


Commonly used values of ¢ include three, four and twelve. It is left to the reader to divide 
S;,, by t if an average desired rather than a sum. 


Theorem 3: 
(a) The expressions S, +, S,,¢ — Sp—z2, andy, — y,—-,can be written as Y 2) v/ x,_,, where 


(i) for Spy, ¥j =: 
a+ [(k —k'*!)/(1 —k)](a— 6), for i=0,1,...,¢-1, 
(kik —k*')/1 — k)] (a — 5), for t=t,t+1,¢42,...; 
(ii) for S,, — Sen VS: 
a+ [(kK —k'*!)/(1 —k)](a— 5b), for i=0,1,...,¢—-1, 
[aR rare Se yO ky Ta bye ae for PEP ee FL S= teow 
— [R71 — kA = A arb), fos § = 2f 2a ink. 
(ili) for Y,y — Ya-z, Vo = @, Vv; = K'(a — b) — a, andy; =: 
Rilg—-b), for f= leo et 
— ke (hi kb!) (a i BY noni teretiene elas (10) 
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(b) For the sets of vectors vo, vj, v2, ... defined in (a), 
re) oo co) M-1 
Var( yD vn-1) = of Deuvi Pik. 2 ye vis pnb "¥ren} (11) 
i=0 i=0 i=0 n=l 


the sums in (11) converge. 


Proof of theorem 3. For (a), successive inclusion of terms y, through y,_;, , and the applica- 
tion of (5) to y,_; yield 


She a a’ (x; + Xn-1 +..s + Kae p24) + K(a c—? b)7x,-1 
+ (k + k*)(a — b)'x,_. + .. 


+ (kK +k? +... + ko) (a — b) x, 141 


eee ee aks) (GeO yam eek & cee (12) 


yer 


The three sets of v;’s are then determined from (12) and (5). 


The proof of (b) is similar to that of Theorem 1, once it is seen that the v;’s defined in (a), 
being linear combinationsi of a and a — J, are in design form. To prove convergence, note 
that, for all three sets of v;’s in (a), v; is proportional to k'(a — b) for i sufficiently large. 
There exists a constant \ > 0 such that, for i => 2¢ and each component j, | vj | < koe 
Recalling that | p; | < 1, and that each row of L” has at most one nonzero element (equal 
to 1), the finite sum in (11) is seento bean M x 1 vector, each of whose components is bounded 
above in absolute value by k'(M — 1)\. Convergence of the double summation then follows 
geometrically in k*!. 


4. EXTENSION TO MULTI-LEVEL DESIGNS 


Although the results developed in Sections 2 and 3 apply to all balanced one-level rotation 
plans, it was observed that many surveys operate under multi-level designs. For example, in 
the Survey of Income and Program Participation (SIPP), one of four rotation groups is inter- 
viewed each month, and respondents supply information about the previous four months. 
Although the design is always subject to change, the first rotation group is interviewed in 
February, June, October, February, efc., for a total of eight interviews. A second group is inter- 
viewed in March, July, etc. The remaining two groups follow the same sampling pattern, begin- 
ning in April and May. A SIPP panel is the set of four concurrent rotation groups covering 
about two and one-half years. Each year, a new panel is introduced. For example, the 1986 
panel ran from 1986 through 1988, while the 1987 panel spanned 1987-89. Data from different 
panels are not combined, even though they may cover a common year or two. For further details 
on the SIPP design, see Nelson, McMillen and Kasprzyk (1984). 

When one-level designs were addressed, a rotation group was allowed to assume any pattern 
of interviews and design gaps - intermediate months which are never referenced - provided 
the design was balanced. In a multi-level plan, however, design gaps can create problems with 
recall. Looking back several months, a respondent may find it difficult to assign an event to 
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the correct period of time. Design gaps can only add to the confusion. For this reason, and 
because multi-level surveys which incorporate design gaps are rare in practice, this section con- 
siders only designs where (i) the sample comprises p rotation groups, (ii) groups are interviewed 
every pth ‘“‘month”’ in an alternating sequence, and (iii) the period of reference is the previous 
p months. 

Many multi-level surveys, for example, the National Crime Survey, sponsored by the U.S. 
Bureau of Justice Statistics, have a more intricate rotational pattern than that covered here. 
As expected, variance formulae applied to composite estimators would tend to be more 
complex. 

The interview of a rotation group will refer to the collective gathering of information in 
the assigned month from all sample units in that group. For a particular characteristic which 
is to be estimated, let x;, ; denote the estimate of ‘‘monthly”’ level for month / from the group 
which is interviewed in monthhA + i, wherei = 1, ..., p. The index i measures recall time - 
the amount of time between the month of reference and the interview. Table 1 depicts the 
estimates x, ; for a four-group four-level design. In the diagram solid lines separate estimates 
which are obtained in different interviews. These boundaries between the reference periods 
of consecutive interviews are called ‘‘seams’’ in the SIPP. 


Table 1 
Layout of Estimates in a Longitudinal 4-Level Design 
SSS 
MONTH ROTATION _. 
| GROUPS 


1 2 3 4 


© Oo nN Dn nA fF WY NY & 


a ee 
-_ WwW NYO —| OC 


i ek py rete’ by 
Note: x, ; denotes the estimate of ‘‘monthly”’ level for month / from the group which is interviewed in monthh + i. 


Interviewing begins in month 5. Solid horizontal lines (seams) separate estimates which are obtained in different 
interviews. 
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Let the vector x,, defined as (Xn,1> Xh,2» + ++» Xh,p) 7, comprise the p estimates for month A 
obtained from the p groups in different interviews. Note that XP XR Pais dts ARE pais 
are estimates for p different months obtained from one group in a single interview (in month 
h + p). 

As in Sections 2 and 3, the generalized composite estimator for monthly level is defined 
as 


Yn = Vy aixni — KY Oi X10 + naa, (13) 


where the summations now range from 1 to p. Defining a and b as (aight. 237 aw) % and 
(O(N, by) ue respectively, the GCE can again be written as 


Yn = a'x, = kop S a kyp,_}- 


The covariance structure of the monthly rotation group estimates is assumed to be stationary 
in time. Under this multi-level design, however, the length of time between the target month 
hand the corresponding interview in monthh + imay affect the variability of the response, 
Xn,i- For i = 1,..., p, let d? represent the response variability as a function of the amount 
of time between the reference month and the interview. The following covariance structure 
is postulated: 


(i) Var(x,,;) = d?o? for all h and i, where d; > 0; 
(ii) Cov (X1,15Xn,;) = 0 fori # j; and 


(iii) For r = 0: Cov (X;,i.X%n—rj) = p,,id;d;o", if the two x’s refer to the 

same group r months apart; or 0, otherwise. Take Po,; to be 1 for all i. (14) 
It may well be thatd; < d, < ... < d,, if response variability increases with recall time. 
The subscript r in the correlation coefficient p,,; is the amount of time between the months 
referenced by estimates Xp,; and x, _,,;. The subscript i indicates that the estimate for month 
his obtained from an interview i months later. For specified values of h, r and i, there is only 
one value j, 1 < j < p, for which the estimates Xp,; and x;_, ; refer to the same panel and 
Cov (Xp, is Xn-7,j) is nonzero. (This value is j = mod,(i + r — 1) + 1, where mod, (7) is the 
value of the integer n, modulo p.) Otherwise, the covariance is 0. In some cases, it may be 
appropriate to replace Pris +++» Pr,p With a commom p,. 

No assumptions are made about bias. In addition to the effect of recall on variances of group 
estimates as postulated in (14), a bias related to recall time might also be incurred. Another 
source - time-in-sample bias - can result according to the number of times a respondent has 
been interviewed (Bailar 1975). Although these biases need not be measured to derive the 
variance formulae given in this section, they might constitute a nontrivial component of mean 
squared error. 

Define the p x p matrices D, P, and Jas follows. Let D and P,, for r = 0, be diagonal 
matrices with d,, ..., G7 and p- 15-265 Pr,p» Tespectively, along the diagonal. Define J as: 
Foe iasol forties di2sotoacpiert: Jp, = 1; and J;; = 0, otherwise. The powers of J form 
a cycle with J” = J, where J is the p x p identity matrix. An argument similar to that in 
Section 2 leads to Var(x,) = 07D? for all h, and, in general, Cov(x;,,x,_,) = o°DP, J'D, for 
r=0,1,2, ..., andallh. 


162 Cantwell: Variances for Composite Estimators 


Finally, define the matrix Zas ¥ *_, k"P,,J”. For general p, i, and j, it can be shown that 
the ith cell Z;; is an infinite sum of terms: 


Ape 3 k“p, ;, where u = pm + 1+ mod,(p—it+Jj-— 1). 
m=0 


Because the p values represent correlation coefficients, it follows easily that Z is finite. 


Analogous to theorems 1, 2, and 3 proven earlier are theorems 4, 5, and 6 presented below. 
The former three allow any pattern of design gaps, but apply only to one-level designs. 
Theorems 4, 5, and 6 do not permit designs gaps. 

The proofs of the theorems are similar to those in Section 3 and are not repeated. All results 
apply to the limiting case where rotation groups have been in sample long enough to eliminate 
the effect of phasing in the sample. If the p,,;’s decrease rapidly with r, or if k is relatively 
small, the ‘‘steady-state’’ arrives within a couple of interviews. 


Theorem 4. If the GCE of level is defined as in (13), and the covariance structure of (14) holds, 
then 


Var(y,) = o?{a7D’a + k*b™D?(b — 2a) 


+ 2(a — k*b)™DZD(a — b)}/(1 — k?). 


Theorem 5. Let y, — y;,—, be the GCE estimator of ‘‘month-to-month’’ change. Then 
Var (Yn — Ya-1) is 


(i) 207a’D(I — P,J)Da, if k =0, and 
(ii) o?(a7D*a + k?b™D?b — 2ka™DP,JDb)/k — (1 — k)?Var(y,)/k, if O< k <1. 
Theorem 6. Define S, ; as in (9), the sum of the GCE’s for the last ¢ periods. Then S;;, 


Sn.t — Sh—tt, and Vy — Yp,—, can again be written as ) © 9 v/X;,—;, Where the vectors Vo, v1, 
Vv, ... are found in (10). For these sets of vectors, 


Var( \? vei) = of yy vi D*v; + 2 y) vi )) DP,J"D rien} (16) 
i=0 i=0 i=0 n=l 


the sums in (16) converge. 
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GUIDELINES FOR MANUSCRIPTS 


Before having a manuscript typed for submission, please examine a recent issue (Vol. 10, 
No. 2 and onward) of Survey Methodology as a guide and note particularly the following 


points: 

1. Layout 

1.1 Manuscripts should be typed on white bond paper of standard size (8% x 11 inch), 
one side only, entirely double spaced with margins of at least 14% inches on all sides. 

1.2 The manuscripts should be divided into numbered sections with suitable verbal titles. 

1.3. The name and address of each author should be given as a footnote on the first page 
of the manuscript. 

1.4 Acknowledgements should appear at the end of the text. 

1.5 Any appendix should be placed after the acknowledgements but before the list of 
references. 

is Abstract 
The manuscript should begin with an abstract consisting of one paragraph followed 
by three to six key words. Avoid mathematical expressions in the abstract. 

3. Style 

3.1 Avoid footnotes, abbreviations, and acronyms. 

3.2 Mathematical symbols will be italicized unless specified otherwise except for functional 
symbols such as “exp(:)” and “log(-)’”, etc. 

3.3. Short formulae should be left in the text but everything in the text should fit in single 
spacing. Long and important equations should be separated from the text and numbered 
consecutively with arabic numerals on the right if they are to be referred to later. 

_ 3.4 Write fractions in the text using a solidus. 

3.5 Distinguish between ambiguous characters, (e.g., w, w; 0, O, 0; 1, 1). 

3.6 Italics are used for emphasis. Indicate italics by underlining on the manuscript. 

4. Figures and Tables 

4.1 lll figures and tables should be numbered consecutively with arabic numerals, with 
titles which are as nearly self explanatory as possible, at the bottom for figures and 
at the top for tables. 

4.2 They should be put on separate pages with an indication of their appropriate place- 
ment in the text. (Normally they should appear near where they are first referred to). 

= References 

5.1 References in the text should be cited with authors’ names and the date of publication. 
If part of a reference is cited, indicate after the reference, e.g., Cochran (1977, p. 164). 

5.2 The list of references at the end of the manuscript should be arranged alphabetically 


and for the same author chronologically. Distinguish publications of the same author 
in the same year by attaching a, b, c to the year of publication. Journal titles should 
not be abbreviated. Follow the same format used in recent issues. 
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MORRIS H. HANSEN 


(1910-1990) 


This issue is dedicated to the memory of Morris H. Hansen, 
a pioneer, innovator and leader 
who made fundamental and lasting contributions 
to many aspects of survey methodology. 


bras Pet 


Lee] 


ae) em c Po 
n = oe . 
4 OF. B . ieee : 
4 dee pos an a 7 : _ 
Spies HBtCD Beinn Tany 2 Giemoa BWISOS BtERUAGY 315A BaTiO 'aA'% 


ther 


Ruenshow 
weIosqrs bd 


‘ DAL, 9 ey, 4) ONT em tee rel KD ee ands wap a 
_ 7 > were S| > _ jie me ‘ 


Lae ov ipnre> * Uarr5 Ty & =e 


Eieace | 


LSn' Gy hotesibyb i 


© 
g 
pee ete, = 
‘Bisomser’ Ley 8% oe x & a sat he " r ‘2 r rons WY - — o ont, et 4 & - ae! - ™ <= 
— - { ? ? > ] Ae) ed ¥ > 
7 eel a Beaengy@'s! ws wea | & As] B aA Da rede. | un : “er Tay bg ¢ TET hak a | 
7 : _? “ = Ay r \ : PA ey yoronisy Game: eats y mes. 
- ie (erg PauUG te thet Dee bed >. =< oe. ae Mi beni ier ) - 2 ng - oni 
a 7 ing - ean hems =" UieEnyid ) Mure mod sen per, POSIT fael Sioes © 1) Megat 
a ; P f ~ > . : 
pam beeen seg 3 eS ete Pho eRe Oe Retiquycs; “Tate gs Sears Ty Dae Cut =e : 
i - ie met wa) = es ee a uae : : Z er. S 7. _ 
eis Pury y LA} fy) ? pag © eC ) a 9 : O+« A039) We: a - ers eee Daca 


¥ 


it ee a = 


norm ty 


adoimdrinto goctastibie ts 


a =] a 
ats. * i de , : Por: 
ij —~ {ot Hid e + hy ‘ ys D ‘ wed earth: ger) brs A a> tard “. - 
S494! eH OPK C i ace aoe uP ititwewa « i : P = 
or Mes acd | hepa mn Obi WS ly 1 ey ey LEB iED > 
7 he ite i, ion - a2 == 7 7 - bad ee” 
nh ot shea tp ms 4 Fee ‘= — ie : J i 3 ae eae wes eee 
a OF vet a Me ttre ; 3 ie i » +4 ) mi 
. Ried TREES Se SY ; ¢ es yy len; teeta ely | - 
Li Wee « cr a ote Penge 1 >. ii - , > re ma y- 
yes ona Pp. , pees hi terete BOD pe ee 2 
Set aes haw >i cat) 6 oy f Ad : ’ im beens — 
i ' “y » ' = J f wm J walt a = Se =... 
UME) Pe etd) core Ine us Hii - : ; - nn 
1 wai f= ' 5 J Pe | " w i 
Pee be Sei ! 2h = — x 7 f at « } 7 
as ° > BH 4 oD Tee ry . ; — 
Sroail fteiOgeso’. is nes 7. =a vie a an : 7 : 7 7 
. at ] « ; ; ¥ Cs 1 Oy Se Mia 2 |CU 
7 ' ; a 
7 —_ : = ~ 
ee, a | mi PPR ties teney 
; ee 7 1 aha Ae ;  ' : = : 
l = = S. = ~» a = 
- ; ral =m) oe én -* ry “4n5 en >| a 2S? oe «ss : 
. aa ‘ y ~~ ~~ oc ‘fs : oe 
oS bunt yr. oe Ee Giwaee set ee 
=~ - oe" am) a _ + “ ll . , * a ——= i ~*~ 73 
SS . a 4 = a BeSS2e2ge85 


SURVEY METHODOLOGY 


A Journal of Statistics Canada 
Volume 16, Number 2, December 1990 


CONTENTS 
Ing Lhispssue feo. eee vets ee Se ee es eee en Le Problems Of sunie dime 165 
Time Series Methods in Surveys 
W.A. FULLER 
PAlIaIySIStON Repeated SURVEYS DReet skirt Ui eerie cen. Wine, wed Shae: 167 


K.M. WOLTER and R.M. HARTER 
Sample;Maintenance: Based-onyPeano Keys :........5.0.cs2ecccsencsctaceeeues 181 


W.R. BELL and S.C. HILLMER 
The Time Series Approach to Estimation for Repeated Surveys ................. 195 


D. PFEFFERMANN and L. BURCK 
Robust Small Area Estimation Combining Time Series and Cross-Sectional Data.. 217 


D.A. BINDER and J.P. DICK 
A Method for the Analysis of Seasonal ARIMA Models ...................00-- 239 


D.R. BRILLINGER 
Spatial-Temporal Modelling of Spatially Aggregate Birth Data ............. ete naa) 


N. LANIEL and K. FYFE 
Benchmarkingiof Economic ime:Seriescs. cise ae. esitiets de oan. See Rc ket 274 


S. BANDYOPADHYAY 


Forgot the Sampling Scheme at the Estimation Stage? ...................000005 279 
H. LEE 
Estimation of Panel Correlations for the Canadian Labour Force Survey......... 283 


A.R. SILBERSTEIN 


First Wave Effects in the U.S. Consumer Expenditure Interview Survey.......... 293 
E.A. STASNY 

Symmetry in Flows Among Reported Victimization Classifications with 

POL CSUOUSC Re feat ir. Nena ee eee eet ce iat ATT eer e Meera re oes ere eee 305 


PRIMI W ICLIDEINICIILS Eon ratcce an ee Ce eee ee ses OT Ee an Ce ae ae 331 


22 2.2.8 


Tal 
tat . 


“* ea @ 


ee oe ee oe ee epece ue eis © wes 


FIC. we lnmoizoe2-eecv)ban sabe omit gaicidmo2 bs ee 
0 OKT, uv bed 

ere eh Pa ee -abboNi AMSA Indonesian a 1 a 

RES cade as anna aie did cisgriagA ylteitag2 to gailishoM inci a 

Lh eee err rise ee? a Pb 2 rid smiT simonosdl to 

ers -, Toeeié noise oii! Is ommsdoe gnilqure? ot 19 

ERS (ovwe 39104 Wwoded nsibans> sil} 701 anoiteisno2 lsne9 Jo At : 

eardienaeeeee 

BOS oc cae es word waivieinl sunibsogal yomianoD .2.U od! ni emohS oveW ti % 
‘ 12k BS 
diiw ecottavitizenl aoitaximini¥ f banoqed pisces ewort ai sucess 8 

lee znomogbslwomisA 


Survey Methodology, December 1990 TOS 
Vol. 16, No. 2, pp. 165-166 
Statistics Canada 


In This Issue 


This issue contains a special section on time series methods in surveys, a topic that has attracted 
considerable interest in recent years. Special thanks are due to W.A. Fuller and J.N.K. Rao for 
coordinating the editorial work for this section. 

The first two papers of the special section deal with the problems of sample design and 
maintenance, and estimation of various parameters of interest in repeated surveys. Fuller notes 
that repeated surveys designed to enable estimation of the parameters of the measurement error 
process can be very cost efficient. For a two-period survey with fifty percent overlap, he shows 
that generalized least square estimates of longitudinal parameters can have substantially lower 
variance than the simple estimator based only on the overlapping units. Wolter and Harter deal 
with the problem of sample maintenance for a recurring survey. The ingenious use of a Peano 
curve allows the sample maintenance to meet several desirable properties. They describe an 
application to a marketing survey. 

Bell and Hillmer discuss the underlying philosophy of the time series approach to estimation 
in repeated surveys based on the recognition of two sources of variation: time series variation 
and sampling variation. They obtain some theoretical results regarding design consistency of 
the time series estimators, and uncorrelatedness of the signal and sampling error series. They 
also observe that the use of signal extraction results from time series analysis can improve survey 
estimates by reducing their mean square error. 

For repeated surveys, better small area estimates can be obtained by combining the usual 
approach based on synthetic estimation with the use of time series models. Pfeffermann and 
Burck examine the statistical properties of such predictors. They illustrate the procedure with 
the use of data on home sale prices. 

Time series described by ARIMA regression models with survey errors following an ARMA 
process is the subject of Binder and Dick’s paper. Such models can be applied to data from surveys 
with a two-stage design where the first stage units are replaced randomly, while the second stage 
units have a rotating panel design. The authors give an example using Labour Force Survey data. 

Brillinger studies the relationship of births to time and geography using data for women aged 
25-29 in Saskatchewan. Smooth surfaces are obtained from data aggregated by census division. 
The Poisson-lognormal distribution is also fitted to the data. 

In the last paper of the special section, Laniel and Fyfe describe the problem of benchmarking 
sub-annual series and briefly review some solutions proposed in the literature. They then present 
two new methods - one based on a model for trends and the other on a model for levels - and 
discuss their suitability. 

In his paper, Bandyopadhyay proves that for a class of estimators and sampling schemes, 
one can ignore the sampling weights when estimating a ratio. He applies this to a well-known 
example to illustrate the result and makes a comparison with estimation using a ratio of Horvitz- 
Thompson estimators. 

In repeated surveys with rotation panels, knowledge of panel correlations is essential for certain 
statistical analyses, such as studies of composite estimators. Lee provides methodology for 
estimating correlations between panel estimates in the Canadian Labour Force Survey. 

Misdating or ‘‘telescoping”’ is a recognized source of errors in retrospective surveys. Silberstein 
estimates telescoping effects to obtain estimates for the unbounded first wave in the U.S. 
Consumer Expenditure Interview Survey. She finds that estimates from the first wave are greater 
than estimates from subsequent waves even after accounting for telescoping effects and concludes 
that a shorter recall period for the first wave improves reporting in subsequent waves. 
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Stasny presents several models for gross flows in the presence of nonresponse. The models 
are divided into those with symmetric and asymmetric transition probabilities. Methods for 
obtaining parameter estimates for the various models are developed and applied to victimization 
data from the U.S. National Crime Survey. 

Finally, readers will notice that, with this issue, Survey Methodology has a new cover. The 
previous cover was used since December 1984 (Vol. 10 No. 2). Statistics Canada is making 
similar changes to all its publications to incorporate a unique logo and to create a standardized 
corporate look. 


The Editor 
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Analysis of Repeated Surveys 


WAYNE A. FULLER! 


ABSTRACT 


Repeated surveys in which a portion of the units are observed at more than one time point and some 
units are not observed at some time points are of primary interest. Least squares estimation for such surveys 
is reviewed. Included in the discussion are estimation procedures in which existing estimates are not revised 
when new data become available. Also considered are techniques for the estimation of longitudinal 
parameters, such as gross change tables. Estimation for a repeated survey of land use conducted by the 
U.S. Soil Conservation Service is described. The effects of measurement error on gross change estimates 
is illustrated and it is shown that survey designs constructed to enable estimation of the parameters of 
the measurement error process can be very efficient. 


KEY WORDS: Survey sampling; Least squares; Measurement error; Gross change. 


1. INTRODUCTION 


There is considerable interest in the analysis of surveys that are repeated in time. Evidence 
of this interest is the recently published proceedings of a conference on panel surveys edited 
by Kasprzyk, Duncan, Kalton and Singh (1989), sessions at the meetings of the International 
Statistical Institute held in 1987 and 1989, and the Statistics Canada Symposium on Analysis 
of Data in Time held in October 1989. Smith and Holt (1989) at the 1989 ISI session in Paris 
call this a “‘resurgence of interest in the design and analysis of longitudinal studies.”’ They note 
that researchers in areas such as sociology and health have long conducted panel surveys and 
cohort studies. They cite, as an example, Lazarsfeld and Fiske (1938). An example in a health 
related area is the study of Garcia, Battese, and Brewer (1975). 

Official agencies conduct many surveys, such as labor force surveys, on a regular basis. The 
output of such surveys is usually a sequence of reports, such as those on current employment 
and unemployment. Typically, very few statistics on the behavior of individual units over time 
have been reported from repeated official surveys. An example of a survey designed to produce 
longitudinal estimates is the U.S. Survey of Income and Program Participation. See Kasprzyk 
and McMillen (1987). While information on private surveys is less complete than that on 
government surveys, it seems that the most common use of repeated private surveys is also 
to produce a sequence of reports for points in time. However, the demand for longitudinal 
analysis has increased for both public and private data providers. 

The complex issues associated with repeated surveys are brought into focus when one 
attempts to develop a taxonomy for such studies. Duncan and Kalton (1987) list some seven 
objectives of surveys repeated over time. These are: 


A. To provide estimates of population parameters at distinct time points. 
B. To provide estimates of population parameters summed across time. 


C. To measure net change at the aggregate level. 
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D. To measure components of change including 
i) gross change 
ii) change for an individual 
iii) variability for an individual. 


E. To aggregate individual data over time. 
F. To measure the frequency, timing and duration of events. 


G. To accumulate information on rare populations. 


While not mentioned explicitly, several of these objectives implicitly include the estimation 
of the parameters of subject matter models. 


Duncan and Kalton also define four kinds of surveys. Their definitions were: (1) repeated 
survey, in which no attempt is made to guarantee that particular elements appear in more than 
one sample; (2) the pure panel survey, in which the same elements are observed at every point 
in time; (3) the rotating panel survey, in which there is a fixed pattern under which elements 
are observed for a fixed number of times and then rotated out of the sample; and (4) the split 
panel survey, in which a pure panel survey is combined with a repeated survey or a rotating 
panel survey. Duncan and Kalton present a table in which they outline how the different kinds 
of surveys are appropriate for the different kinds of objectives. 

An institution conducting a repeated survey faces all of the usual survey problems, but the 
problems are magnified relative to a one-time survey. The quality repetition of a survey requires 
maintaining consistent field, processing, data management, and estimation procedures over 
time. It is difficult to maintain cooperation over time and it is difficult to trace people who 
move. Response error is present in all surveys, but repeated surveys encounter problems of 
“‘conditioning’’ associated with repeated interviews. Also, response errors introduce incon- 
sistencies into data collected over time. Finally, the changing composition of units, such as 
families, over time complicates estimation and analysis. 

We shall examine only a few issues associated with repeated surveys. Our discussion is 
motivated by a large scale survey conducted by the U.S. Soil Conservation Service with the 
cooperation of Iowa State University. In Section 2 we review some of the estimation techniques 
applicable for repeated surveys. This discussion is continued in Section 3 with more emphasis 
on estimation of longitudinal parameters in panel surveys. In Section 4 we briefly describe the 
estimation procedures used in the U.S. Soil Conservation Service study. Section 5 contains 
a short description of the effects of measurement error on gross change estimates. 


2. ESTIMATION 


In this section we outline generalized least square estimation for surveys with only a subset 
of elements observed at successive times. Generalized least squares was the procedure first con- 
sidered by authors studying estimation for surveys repeated in time. Beginning with Jessen 
(1942), who was influenced by Cochran (1942), these authors considered the construction of 
minimum variance weights for a set of unbiased estimators available at each point in time of 
the survey. 

Jessen (1942) investigated the special case of sampling on two occasions with unequal 
numbers of observations, and studied the optimal allocation of units to overlapping and 
nonoverlapping sample groups. Patterson (1950) considered sampling on T occasions under 
several schemes of partial replacement of units. The simplest such sampling plan required the 
replacement of a fixed proportion of sampling units on each successive sampling occasion. 
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Also, Patterson (1950) assumed that for a given i, the differences x,; — x,,¢ = 1,2,..., 
followed a first-order autoregressive process, where x,; was the value of the i-th population 
unit at time ¢, and x, was the corresponding finite population mean. Under the resulting error 
model, he developed optimal estimators of the fixed x, values and of the differences Kp— Xp}. 
He also considered the optimal estimation of x, under generalizations of the partial replace- 
ment plan, optimal sample size selection, and estimation with nonautoregressive errors. 

Least squares procedures were considered further by Eckler (1955), Gurney and Daly (1965), 
and Jones (1980). Composite estimation was a name given to certain types of estimators. See 
Rao and Graham (1964), Graham (1973) and Wolter (1979). Battese, Hasabelnaby and Fuller 
(1989) describe the application of the least squares procedure to a farm survey conducted by 
the U.S. Department of Agriculture. 

It seems fair to say that the parameters under consideration by these authors were means 
or totals at specific time points. That is, longitudinal parameters, such as the fraction of 
individuals in a particular class at both time 1 and time 2, were not explicitly considered by 
these authors. However, as we shall see, the least squares method extends to longitudinal 
parameters. 

Linear least squares has the desirable feature that estimators for a number of characteristics 
are internally consistent. That is, the least squares estimator of Y plus the least squares estimator 
of Z is the least squares estimator of Y + Z. However, if different vectors of observations 
are used to construct different estimates, the internal consistency is destroyed. 

In many applied surveys it is not possible to compute the optimum least squares estimators 
for all points in time because all available information cannot be used in the estimation. First, 
it is not possible to incorporate all data from the surveys of preceding times into a least squares 
analysis for the current time because the number of variables often exceeds the number of obser- 
vations. Second, the releasing organization may be restricted in the number of times they can 
revise previous estimates. This second point has been discussed by Smith and Holt (1989). 

To illustrate these estimation problems, we have constructed a small example. A two-way 
table for classification at two points in time, as observed in a very large sample, is given in 
Table 1. We have given names to the categories in this table, letting the first category be 
employed and letting the second category be unemployed. We shall assume that the population 
is constant over time. If there are births and deaths, then the table would need to be increased 
toa3 xX 3 table. Let us assume that we are interested in estimating the change in level from 
one period to the next. Let us also assume that we are interested in the gross change table which 
involves estimating the interior cells of the table. In the 2 x 2 table it is only necessary to 
estimate the (1, 1) cell and the marginal proportions to define all cells of the table. 

We assume a two-period study in which an equal number of elements are observed at each 
of the two times. We assume that one half of the elements observed at the first time are also 
observed at the second time. That is, of the elements observed at the second time, one half 


Table 1 
Hypothetical proportions for two points in time 
TIME 2 
TIME 1 SRS REA ae ee as oe LT ek PP 
Employed Unemployed Total 
Employed 0.91 0.02 0.93 
Unemployed 0.03 0.04 0.07 


Total 0.94 0.06 1.00 


170 Fuller: Analysis of Repeated Surveys 


Table 2 


Covariance matrix of the vector of sample proportions, 
two time points and fifty percent overlap in sample 
(For a sample of size n multiply entries by 2 and divide by 7) 


PR. Pr.2 Pre Pr P.R3 


0.0651 0 0 0 0 
0 0.0651 0.0637 0.0358 0 
0 0.0637 0.0819 0.0546 0 
0 0.0358 0.0546 0.0564 0 
0 0 0 0 0.0564 
Table 3 


Variance of alternative estimation procedures 
(For a sample of size n at each period, multiply entries by 2 and divide by n) 


Procedure 
Parameter [RT RTL, GG Lc LL CLL Tc 
Simple Restricted GLS Full GLS 
Pr. 0.0326 0.0326 0.0294 
Perr 0.0819 0.0397 0.0374 
ONE, 0.0278 0.0258 0.0255 
Perr/P-¢ 0.0290 0.0229 0.0220 
Py — Pr. 0.0429 0.0367 0.0353 


were observed at the first time and one half are new to the sample. We take as our vector of 
observations the vector containing the proportion of elements in category | in the one half of 
the sample that is not observed the second time [denoted by P;.,], the proportion of elements 
in category | at time | in the remaining half of the sample [denoted by P;.,], the proportion 
of elements that are in category 1 at both time 1 and time 2 for the portion of the sample that 
is observed at both time periods [denoted by Pre], the proportion of the elements in category 
1 at time 2 for the elements that are observed at both times [denoted by P;,], and the 
proportion of elements in category 1 at time 2 for the portion of the sample that is observed 
only at time 2 [denoted by Pz]. 

We assume simple random sampling. Then, because the statistics are sample proportions, 
it is easy to write down the covariance matrix of the vector of five estimators. A multiple of 
that covariance matrix is given in Table 2. To obtain the covariance matrix for a sample of 
size n at each time period, divide every entry in the table by n and multiply by two. In Table 3 
we give the variance of alternative estimation procedures. In the first column is the variance 
of the procedure that uses as the estimator of the first period proportion only the elements 
appearing in the first period sample. To estimate the fraction appearing in category 1 (employed) 
both at time | and time 2, the simple procedure uses only the overlap elements, and to estimate 
the number in the first category at time 2, it uses only the sample observed at time 2. Thus, 
if we have a sample of 200 elements at each time period, the first period sample of 200 elements 
is used to estimate the first probability. The 100 elements observed at both time 1 and time 
2 are used to estimate the proportion of the elements in category 1 at both time 1 and time 
2, and the 200 elements observed at time 2 are used to estimate the time 2 proportion. 


Survey Methodology, December 1990 Tr 


The last column is the variance of the best linear unbiased estimators constructed using 
generalized least squares. The estimators are constructed from the vector of five basic statistics 
and the covariance matrix of that vector. This estimator is of the form 


si Voix ixwy, (1) 
where V is given in Table 2, 8 = (Pz., Pr, Pre), 


11000 
X’ = {| 00011 
00100 


and Y is the five-dimensional vector of direct estimates, 
Y’ = (Pg.,Pe.2,Pr¢,P.m,P 53). 


The second column of Table 3 gives the variance of the restricted least squares estimators, 
where the restriction is that the estimator for the first period must be the estimator obtained 
from the initial sample. This would be an appropriate procedure if the agency never made a 
revision in the once published estimates. For example, the Bureau of Labor Statistics in the 
United States does not revise the unemployment statistics. Once released, they are the official 
estimates. Of course, the United States unemployment statistics are based on a more com- 
plicated sample and are based on a survey that is conducted over a longer period of time than 
our example. 

To describe the restricted generalized least squares estimator of Table 3, let the model be 


Y= XB +e, 
where X is a fixed n X k matrix and 
Efee’} = V. 


The generalized least squares estimator of 8, with some elements of 8 restricted to be certain 
linear combinations of Y can be constructed as follows. Consider the Lagrangian 


b 
(igeeXO al al Vor XB i 2a ach OE Bee), 


i=1 


where IT; is a fixed row vector and b is the number of restrictions. The solution to this 
minimization problem is defined by 


ft Aa? ied Vania yj « da What 2, Son ena 4 
r 0 r g ; 
Ware At = (Nis Nos eA el ir as ee ee) and go ie) Pe eds) MOTE we 
replace g by the linear combination GY, the equation becomes 


iene aera gut) © 
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This equation defines the restricted estimator of 8 as a linear function of Y. Hence the variance 
of the estimator of 8 is the upper k x k portion of 


AATEC BAN EN ZN oe es ae Bosey iF 
etary oe ec! er Gr) 
This is not the only way to compute the restricted generalized least squares estimator. An 
alternative estimator of level and change that leaves the previous estimator unchanged is the 

composite estimator. See, for example, Wolter (1979). 

Several points are illustrated by this small example. First, with a correlation of 0.591 between 
employment at the two time periods, the improvement in the current estimate of employment 
from using generalized least squares is modest, about 10%. On the other hand, there is a very 
large improvement in the variance of the estimate of Peg from using generalized least squares. 
The variance of the generalized least squares estimator of Pp is about 45% of the variance 
of the simple estimator. The second important point is that the use of restricted generalized 
least squares to estimate Pp, and Pg; produces estimates that are nearly as efficient as full 
generalized least squares. There is about a one percent loss for the estimate of P; and about 
a six percent loss for the estimate of Prr. 


3. LONGITUDINAL ESTIMATORS 


Recall that our definition of a pure panel survey is one in which the same elements are 
observed at every time point of data collection. The pure panel survey is possible for observa- 
tions of certain physical units, such as plots of land. In the case of surveys of human popula- 
tions, the pure panel must be considered to be a figment of the statistician’s imagination. In 
the real world, a fraction of the respondents from the first time are always unavailable at the 
second time. Good reviews of procedures for missing data are given by Lepkowski (1989) and 
Little and Su (1989). Also see Little and Rubin (1987), Kalton (1983) and Madow et al. (1983). 

We have described the rotating panel survey in which the design calls for some elements 
to leave the study and some elements to enter the study at every time point at which the study 
is conducted. In this type of survey we might say that we have planned nonresponse for those 
elements that are rotated out of the sample. Thus, estimation in the presence of nonresponse 
and estimation for rotating panel surveys are related problems. 

Given that one does not obtain data from every respondent at every point in time of a 
repeated survey, one is faced with a choice among methods of handling planned and unplanned 
nonresponse. There are two simple, and common, procedures. If the interest is in following 
individuals over time, then very often the investigator retains in the study only those individuals 
that responded every time. A weighting procedure may be used to adjust the data using 
characteristics of the initial respondents and (or) external auxiliary data. This procedure is often 
used in special one-time studies of a specific population. In such situations the report on the 
study is released only after the entire study is completed. 

The second common type of estimation procedure is to construct estimates for each time 
period using the data that are available for that time period. This procedure is often used if 
the survey is repeated regularly, the results are released after each survey, no revisions are made 
in the releases, and no longitudinal estimates are produced. One-period-at-a-time estimation 
has the advantage of being very easy to compute at time ¢ because no information from the 
previous period is used in calculating the current estimators. It generally gives good estimates 
(not optimal) of the current value, but rather poor estimates of change. 
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In fact, one might use both of these procedures in a single survey. The Survey of Income 
and Program Participation (SIPP) conducted by the U.S. Bureau of the Census is a panel survey 
with a rotating time-of-interview with a four-month recall period. The Census Bureau provides 
a set of weights at each time of the survey that can be used to construct estimates for that point 
in time using all individuals that respond at that time point. They also provide (a) the sample 
of individuals that responded all eight times for the period 1984-1985 with weights for these 
individuals, (b) the sample of individuals that responded all four times in 1984 with an 
appropriate weight and (c) the sample of individuals that responded all four times in 1985 and 
an appropriate weight. 

We outline an estimation procedure for a panel survey with nonresponse where the analysis 
is conducted at the end of the survey. It is assumed that a reasonable fraction of the units 
respond at all time points of the survey and that longitudinal analysis is of interest. The com- 
putational procedure consists of constructing weights for the units with complete response 
records. Information from respondents with incomplete records constitutes a form of auxiliary 
information. 

The first step in the analysis is to pick a few variables that are very important to the study. 
The number of variables that can be used will depend upon the sample size. The covariance 
structure of the vector of estimates composed of the simple estimates for each of these variables 
for each type of response pattern for each point in time where the estimate is appropriate, is 
computed. The covariance structure is a function of the response-nonresponse pattern. There 
are different definitions of simple estimators. For simple random sampling, simple estimators 
are simple means. For stratified samples, one might define the original vector to include 
estimates for each stratum. Alternatively, the simple estimator for a stratified sample might 
weight the responses in each stratum for nonresponse. The vector Y used in (1) is an example 
of a vector of simple estimates. 

Given the vector of simple estimators and the estimated covariance matrix of the vector, 
improved estimators for each of the time periods is constructed by generalized least squares. 
For example, if we had a panel study with three time points, there are seven response patterns. 
These are XXX, 0XX, X0X, XX0, X00, 0X0, 00X, where X denotes response and 0 denotes 
nonresponse. If we choose two variables of interest, the vector of simple estimates will contain 
12 x 2 = 24 estimates because there are 12 group-response times associated with the seven 
response patterns. In this example, generalized least squares would be used to produce six 
estimates, the estimates for the two variables for each of the three time periods. 

The generalized least square estimators for the selected characteristics become control 
variables for a next stage of estimation. Using regression weighting methods, weights are 
constructed for the individuals that responded at all time periods. The weights are constructed 
so that the generalized least squares estimates for each time period are reproduced by the 
weighted sample of 100% respondents. That is, the time estimates for the chosen variables are 
used as controls. ; 

The efficiency of the procedure depends upon the correlation between the chosen control 
variables and the analysis variable. If a control variable is also the analysis variable, the 
procedure will be very efficient. The procedure is less than fully efficient for the control 
variables only because a limited amount of information is used in the generalized least squares 
procedure. 

The strong advantage of the outlined procedure is that it produces a single tabulation data 
set that can be used to construct internally consistent estimates for all reporting times and for 
all gross change tables. The disadvantage is that estimates for particular points in time are less 
than fully efficient. 
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The variance of the procedure can be computed by analogy to the procedures used for double 
sampling. Let Y be the characteristic of interest. For simplicity, assume a simple random sample 
at each time. We write the model to be used in estimation as 


Y; = py + (Xj — py)0 + eG; 
px = E{X}, 
e; ~ Ind(0,02). 


Let fix be the generalized least squares estimator of wy. Then our estimator for the mean 
of Y is 


jy =) + (fix — X)6, 


where 6 is the vector of regression coefficients obtained in the regression of Y; on_X; using the 
set of complete observations, and (y,x) is the mean vector for the elements observed at every 
time period. Let m be the number of complete observations. Then the variance of the estimator 
is, approximately 


Vify} = m~'o2 + O'V{jix}0, 


where V{ jtx} is the covariance matrix of jy. 


The least squares estimator we have described will perform well in most situations. However, 
it is possible for the estimator to produce negative estimates for quantities known to be non- 
negative. This is because the estimator is linear and it is possible for some of the weights to be 
negative. Procedures have been developed to avoid this problem. See Huang and Fuller (1978). 


4. THE U.S. NATIONAL RESOURCE INVENTORY 


The Iowa State Statistical Laboratory cooperates with the U.S. Soil Conservation Service 
on a large survey of land use in the United States. The survey was conducted in 1958, 1967, 
1975, 1977, 1982, and 1987. A survey is currently being planned for 1992. 

The survey collects data on soil characteristics, land use and land cover, potential for 
converting land not used for crops to cropland, soil and water erosion, and conservation 
practices. The data are collected by employees of the Soil Conservation Service. Iowa State 
University has responsibility for sample design and for estimation. 

The sample is a stratified sample of the nonfederal area of 49 states (all except Alaska) and 
Puerto Rico. The sampling units are areas of land called segments. The segments vary in size 
from 40 acres to 640 acres. Data are collected for the entire segment on items such as urban 
land and water area. Detailed data on soil properties and land use are collected at a random 
sample of points within the segment. Generally, there are three points per segment, but 40-acre 
segments contain two points and the samples in two states contain one point per segment. Some 
data, such as total land area and area in roads, are collected on a census basis external to the 
sample survey. 

In 1982, the sample contained about 350,000 segments and nearly one million points. The 
1987 sample was composed of about 100,000 segments. The majority of the 1987 sample 
segments were a subsample of the 1982 segments. However, about 1,500 new segments were 
selected in areas of rapid urban growth. Data were collected on about 280,000 points in 1987. 
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Table 4 


Illustration of estimation procedure 
ee tantra ae Oe ee eee et gery Peewee eet FY os prego 


1987 
1982 TOTAL 
Cropland Other Urban Roads 
Cropland 26,243 179 13 6 26,441 
Other 771 7,114 6 2 7,893 
Urban 0 0 623 0 623 
Roads 17 4 0 1,038 1,059 
1987 TOTAL 27,031 7,297 642 1,046 36,016 


For the first time in 1987, it was decided that longitudinal data analysis would be performed 
for the period 1982-1987. Also for the first time, it was decided that the data were to be made 
available to the state Soil Conservation Service staff so that they could perform their own 
analyses. 

In 1987, the field personnel were provided with a preprinted work sheet containing the 1982 
information for the segment. They entered the information for 1987 on the basis of field obser- 
vation and aerial photography. Field personnel were permitted to change the 1982 data if they 
found it to be incorrect. Edit and checking procedures were applied throughout the processing 
operation. 

The sample was designed to produce reasonable estimates for units called Major Land 
Resource Areas. These areas are defined on the basis of soil and cover characteristics. There 
are about 180 Major Land Resources Areas in the study area. Also the acreage estimates for 
any county were to be consistent with the total acreage of that county. There are about 3,100 
counties in the sample. Because the sample must provide consistent acreage estimates for both 
counties and Major Land Resource Areas, the basic tabulation unit is the portion of a Major 
Land Resource Area within the county. There are 5,530 of these units, which we called 
MLRAC’s. 

The design of the sample is a simple form of a panel survey in that the 1987 sample is nearly 
a subsample of the 1982 sample. It was decided to use as the control variables from the 1982 
study, the 1982 acres of 14 major land uses such as cropland, rangeland, forestland, and urban 
land. In addition, the external information, such as 1987 area in roads, and the segment infor- 
mation, such as 1987 area in urban land, is auxiliary information similar to that obtained from 
incomplete observations. 

Table 4 is a condensed version of an estimation table for one of the states in the survey. 
It contains only four uses instead of the 14 actually employed in the estimation. The entries 
in the right column are the 1982 estimates. The entries in the last row for urban land and roads 
are from the segment data and the external sources, respectively. The vector of six entries, (the 
first four entries of the last column, 1987 urban land, and 1987 roads) is a vector of totals cor- 
responding to the vector of estimated means, jiy of Section 3. 

The internal estimates of the table are essentially least squares estimates that satisfy the six 
control totals. In the actual estimation scheme it was necessary to use imputation methods when, 
for example, a change is reported in the segment data, but there is no corresponding change 
in the point data. 

The design produced large variances for the directly estimated change in small uses such 
as urban land, farmsteads, and small water bodies. Therefore, a small area estimation scheme 
was used to construct estimates of change for the major land resource areas within counties. 
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We used a computer program for small area estimation developed at Iowa State University. 
The theory for the small area estimation procedure is described in Fuller (1986). Estimated 
changes in five small land uses for each of the 5,500 MLRAC’s were constructed with the small 
area program. This procedure is essentially an allocation program in that the sum of the 
MLRAC estimates is the state estimate. Estimates for the entries in Table 4 (with 14 categories) 
were constructed for each MLRAC. 

In this estimation, the small area MLRAC estimates, the external estimate for roads, and 
the state marginals for cropland were used as controls. The final step in the estimation pro- 
cedure was the assignment of weights to the point data such that the weighted point data give 
the estimates of Table 4 for each MLRAC. 

To summarize, the final product of the estimation procedure is a tabulation data set of points 
that permits estimation of complete two-way tables of 1982-1987 land use for any identifiable 
area designation. The estimates are consistent with previous estimates for major land use 
categories for the states and are consistent with data from sources outside of the point sample. 

Generally speaking, it is not possible to obtain good variance estimates from the tabula- 
tion sample, although segment and stratum identification are given in the data set. Simple 
variance estimates computed with the point data for principal uses, such as cropland, will be 
too large because of the control on the larger 1982 sample. Proper variance estimation requires 
the use of double sampling formulas. 


5. MEASUREMENT ERROR 


Measurement error can have a very large impact on the analysis of data over time. This 
impact may be moderate in the case of simple means reported at a sequence of times. However, 
in gross change estimation and in regression estimation, measurement error can be extremely 
important. 

To illustrate the magnitude of measurement error bias in estimators of gross change, let 
us return to the simple example of Table 1. If the data were collected by a procedure such as 
that of the U.S. Census Bureau, the work of Chua and Fuller (1987) demonstrates that the 
interior cells of the two-way table will be seriously biased. Also see Abowd and Zellner (1985), 
Poterba and Summers (1985), and Singh and Rao (1990). Under the Chua-Fuller model, the 
response error at the two points in time is assumed to be independent. Also it is assumed that, 
at each time, 


P{response = El|true = FE} = 1 — a + aPz, 
Pi response Ure san, 
P{response = U|true = U} = 1 — a + aPy, 


P{response = E|true = U} = aPr, 


where a is the parameter of the response mechanism. Under this model the expected value for 
the proportion employed at any point in time is the true proportion. A consistent estimator 
for Pre under the Chua-Fuller model is 


tee = (1 — a) 7 {Peg — Pg. Pell — (1 - a)7J}, 


where Ppp, Pr. and P, are the direct estimators and a is a parameter of the response mecha- 
nism. Also see Battese and Fuller (1973). On the basis of the U.S. reinterview data, a value 
of a = 0.10 is not unreasonable. For our example, we have 
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Table 5 


Mean square error of alternative estimators for a sample of 10,000 at 
each time and 50% overlap 
(Mean square error of measurement error adjusted GLS = 100) 


Procedure 
Parameter Ordinary Measurement Error 
Simple Rest. GLS Full GLS Simple Rest. GLS Full GLS 
Pr. iu | 111 100 111 111 100 
Pr 111 101 100 111 101 100 
Perr 1071 967 961 250 106 100 


Trp = (0.90) ~7{0.91 — 0.93(0.94) (0.19) } 
= 0.9184. 


The corresponding two-way table of proportions adjusted for response error is 


0.9184 0.0116 
Gee er, , 

In this example, the bias in the direct estimator of Ppp is 0.0084. Chua and Fuller estimate the 
bias to be about 0.0168 in the three-way table that includes the not-in-the-labor-force cate- 
gory. Table 5 contains a comparison of alternative estimation procedures for Pre. A sample 
of 10,000 is assumed. The first three procedures are those of Table 3. The last three are the 
three estimators adjusted for measurement error bias. In the variance calculations, a is assumed 
to have a standard error of 0.01. The estimators of P;. and P; are not changed by the adjust- 
ment for measurement error bias. In this example, the squared bias in the ordinary estimator 
of Prg is about nine times the variance of the generalized least squares estimator. Thus, the 
measurement error bias dominates the mean square error of the estimator of Ppp. 

These results have serious implications for survey design. To illustrate this, we return to 
the gross change problem. Assume that our objective is to estimate the probability that a person 
will remain employed for two periods, Ppp. We assume that it is possible to conduct inde- 
pendent reinterviews for each point in time, and that interviews at two points in time are 
independent. We assume that the only interview procedures permitted are: 


A. Interview and reinterview at one of the times. 

B. Interview at time one and interview at time two. 

We assume that the response error is unbiased and that a simple two-class (employed and 
unemployed) model is appropriate. We also assume that the probabilities of correct response 


depend only on the current class of the respondent. Let the response probabilities be defined 
in terms of a and let 


y= (1 — e@) 52. 


Let 6,; denote the ij-th element of the 2 x 2 matrix of probabilities observed in the reinterview 
study. That is, 0,; is the probability that an individual responds / on the first interview and j 
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Table 6 
MSE efficiency of MEM to direct 


Sample size, n 


500 1,000 5,000 10,000 


MSE direct/MSE MEM 0.87 i513 Ba22 5.84 


on the reinterview. For this simple model we can obtain explicit expressions for the estimators. 
We have 


1S (611 a 67) "(6 as 67) 
and 
Py = ¥(Pu — Py.P1) + Py.Pi 
where 
0; = O11, ty O13, = 611, 81, 


bi, are the estimates from the reinterview study and P; are the estimates from the interviews 
conducted at the two time periods. 

In constructing the estimator, the reinterview study is used only to estimate the measurement 
error parameter. In fact, the reinterview study could be used in a generalized least squares 
procedure to improve the estimates of P;,;, P;. and P,. Under the assumption that all inter- 
views are of equal cost, it can be demonstrated that about one fourth of the resources should 
be used for the reinterview study. The relative efficiency of the measurement error procedure 
to the direct biased procedure is given in Table 6. 

In small samples, the direct procedure has a smaller mean square error because of the smaller 
variance. Recall that only three fourths of the observations furnish information on Pee = Py. 
However, for samples greater than 750, the squared bias dominates the mean square error of 
the direct procedure and the consistent measurement error procedure has a smaller mean square 
error. This small example demonstrates the efficacy of surveys containing a component to 
estimate the parameters of the measurement process. 


6. SUMMARY AND CONCLUSIONS 


We have reviewed some topics associated with the analysis of repeated data, without 
attempting a complete discussion of the topic. We have shown that procedures based upon 
least squares have the potential to provide large gains in efficiency. Because of size and timing 
considerations, it is not possible to include all available information in the construction of the 
least squares estimators. Thus, in practice, the statistician must choose a subset of variables 
to use in the construction of least squares weights. Estimation for a two-period survey conducted 
by the U.S. Soil Conservation Service was described. 
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We illustrated the large biases that measurement error can produce in longitudinal estimates 
such as gross changes estimates. We showed that measurement error methods exist that can 
be used to construct consistent estimators. The use of one fourth of the available resources 
to estimate the variance of the measurement error in order to use measurement error estima- 
tion methods can be justified. 
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Sample Maintenance Based on Peano Keys 


KIRK M. WOLTER and RACHEL M. HARTER! 


ABSTRACT 


We discuss frame and sample maintenance issues that arise in recurring surveys. A new system is described 
that meets four objectives. Through time, it maintains (1) the geographical balance of a sample; (2) the 
sample size; (3) the unbiased character of estimators; and (4) the lack of distortion in estimated trends. 
The system is based upon the Peano key, which creates a fractal, space-filling curve. An example of the 
new system is presented using a national survey of establishments in the United States conducted by the 
A.C. Nielsen Company. 


KEY WORDS: Recurring surveys; Sample maintenance; Changing population units; Peano key. 


1. INTRODUCTION 


We are concerned with recurring surveys conducted over time and the maintenance they 
require. Let U, denote a survey universe at time ¢, with ¢ = 0 denoting the inception of a new 
survey. We assume a probability sample of units of Up has been selected, and thus that it is 
feasible to construct unbiased (or at least consistent) estimators of the population total and 
other parameters of interest. As time goes by, we assume the universe is surveyed repeatedly 
at regular intervals of time, in part to track the ‘‘level’’ of the population, and in part to measure 
its ‘‘trends’’. A panel or a rotation sampling design is usually employed for this purpose (e.g., 
see Rao and Graham (1964) and Wolter (1979) and the references cited by those authors). In 
all such surveys of people or their institutions, which is all we concern ourselves with here, 
the composition of the universe changes with time as births, deaths, and other changes occur 
to the status of the units. The survey frame, the sampling design, and the schemes for obser- 
ving or collecting the survey data must be maintained for such change; otherwise, the sample 
may become excessively biased and cease to be representative of the universe. 

The types of maintenance issues that arise in recurring surveys depend in part on the kind 
of universe under study, in part on the choice of sampling unit, and in part on the interplay 
between the sampling unit and the universe elemental units. We shall summarize briefly the 
issues that arise in four different situations: 


(i) establishment surveys with establishment as the sampling unit; 
(ii) establishment surveys with company or some similar cluster of establishments as the 
sampling units; 
(iii) surveys of people or households with the address or housing unit as the sampling unit; and 
(iv) surveys of people or households with the household or family as the sampling unit. 


In this work, we use the words ‘‘establishment’’ and ‘‘company’’ in a generic sense. An establish- 
ment may be a retail store, a manufacturing plant, a school, a hospital, a golf course, or any 
other similar, single-location entity, while the corresponding company would be the corporate, 
legal entity that owns the retail store, or the school district, and so on. In some cases, of course, 
the establishment and company will be synonymous, e.g., a single, independent grocery store. 


! Kirk M. Wolter and Rachel M. Harter, Statistical Research Department, A.C. Nielsen Company, Nielsen Plaza, 
Northbrook IL 60062, USA. 
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For case (i), the main universe dynamics include: 


establishments arising from new construction 

reclassified establishments from some out-of-scope category to an in-scope category 
reclassified establishments from one in-scope category to another in-scope category 
reclassified establishments from an in-scope category to an out-of-scope category 
conversion of a structure from residential use to commercial use 

conversion of a structure from commercial use to residential use 

demolition of an existing establishment 

establishment that moves in and out of vacancy status 

changes in the configuration of an establishment, e.g., division into two or more estab- 
lishments. 


Case (ii) is far more complicated than case (i), principally because sampling units are now 
clusters of elemental units. All of the issues from case (i) apply to single-establishment com- 
panies. For multi-establishment companies, we face the following additional dynamics: 


© mergers wherein two companies combine to form a new successor company 

© acquisitions wherein one company is acquired by another, with the acquiring company as 
the sole successor company 

¢ joint ventures wherein two companies collaborate to form a new company that may be a 

subsidiary to both the parent companies 

divestitures wherein a company spins off a new and independent company 

divestitures where a company sells parts of itself to another acquiring company. 


In a sense, case (iii) is very similar to case (i) in respect to the kinds of universe dynamics 
that may arise: 


housing units arising from new construction 

reclassified housing units from some out-of-scope category to an in-scope category 
reclassified housing units from one in-scope category to another 

reclassified housing units from an in-scope category to an out-of-scope category 
conversions from residential to commercial 

conversions from commercial to residential 

demolition of an existing housing unit 

reconfigurations of existing structures, e.g., reconfigurations of apartments within a small 
multiunit structure. 


Note how closely these issues match those for case (i). 


Finally, case (iv) is very similar to case (ii) in terms of the composition and complexity of 
universe change. Maintenance issues include: 


¢ marriage, wherein a new successor family is created, possibly from whole predecessor families 
or from part families 

¢ new members move into an existing family, either eliminating another family or part of a 
family 

e divorce, wherein successor families may be created from one predecessor family 

¢ family members move away, either to join another existing family or to establish a new 
family 

e births of family members 

e deaths of family members 

¢ a whole family moves, thus requiring tracing and perhaps altering field-work assignments. 
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To handle the universe dynamics listed above, properly reflecting them in the sample, so 
that sample representativeness is retained over time, the survey organization must design and 
adopt an explicit system of maintenance. We define a sample maintenance system to be a 
sampling design and a universe updating methodology, possibly specified in the form of simple 
rules, that permit the statistician to achieve known, nonzero probabilities of inclusion for each 
of the elemental units in the population for each time period in the recurring survey, or failing 
that, to weight the survey data properly so as to achieve unbiased or consistent estimators of 
the population parameters of interest. From cases (i) through (iv) above, it is clear that a 
maintenance system must perform at least four functions: 


© give new elemental units a known, nonzero probability of selection 

© account properly for elemental units that may no longer exist in a substantive sense 

¢ not give elemental units multiple chances of selection into the sample; otherwise, if multiple 
changes are given, the system must appropriately record this information so that adjustments 
may be made in the estimation procedures 

© appropriately update the universe frame so as to facilitate and control the above activities. 


A general and necessary rule of thumb for any sample maintenance system is that the system, 
or the rules that define the system, must treat symmetrically universe changes both within and 
outside of the sample. If a proposed maintenance rule violates this rule of thumb, then there 
is risk of bias in estimators of totals and other universe parameters to be estimated. For example, 
consider two rules that might be used for case (ii) for sampling new companies created as the 
result of a divestiture. One possibility is to declare the new companies part of the sample if 
their predecessor companies were part of the sample, and otherwise, if their predecessors were 
not part of the sample, to subject the new companies to a new round of sampling. This rule 
is seen to give the new companies multiple probabilities of selection, and thus may result in 
biased estimation unless appropriate adjustments are made in the estimation procedure. (The 
adjustments we have in mind are related to the multiplicity rules studied by Monroe Sirken 
(1970) and others.) A second possibility is to declare the new companies part of the sample 
if and only if their predecessor companies were part of the sample. Because this second rule 
treats symmetrically the universe changes both within and outside of the sample, it is seen to 
result in unbiased estimation for the survey parameters of interest. 

In designing a sample maintenance system, the statistician must be guided not only by the 
statistical properties of the resulting estimators, but also by the cost, feasibility, and customer 
acceptance of alternative rules. Some rules may require additional data collection, thus entailing 
additional cost that must be planned from the inception of a new recurring survey. Certain 
applications may actually require that additional data be collected retrospectively. This may 
be impractical, or at the very least, may entail considerable nonsampling error, thus risking 
bias. Some rules may well be feasible and cost-effective, yet may not satisfy the requirements 
of the customers or users of the survey data. 

Finally, we note that this problem of maintenance is neither new nor newly recognized; for 
example, maintenance systems have been in place for years in many of the major recurring 
surveys at Statistics Canada, the United States Bureau of the Census, and the A.C. Nielsen 
Company. Nevertheless, there is remarkably little literature on this subject. For brief discus- 
sions of some maintenance issues, see Wolter et a/. (1976) for case (ii), Hanson (1978) for case 
(ili), and Ernst (1989) for case (iv). Also see the broad comments of Duncan and Kalton (1987) 
on household surveys and Colledge (1989) on business surveys. 

In the balance of this article, we focus on case (i), where the establishment is both the 
sampling and elemental unit. This is the case we face in our establishments surveys at the 
A.C. Nielsen Company. Section 2 describes one of our major surveys, the Scantrack survey, 
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and the specific maintenance issues we face in that survey. We also describe some of the key 
objectives we had in designing a new maintenance system for this survey. 

The new maintenance system is based upon a parameter known in mathematics as the Peano 
key, which creates a fractal, space-filling curve. The Peano key is defined in Section 3, where 
we also provide several graphical displays for illustration purposes. We close the article in 
Section 4 by describing the rules that implement our new maintenance system. 


2. THE SCANTRACK SURVEY 


The Nielsen companies provide information from several marketing surveys. The media 
surveys, such as Nielsen Television Index and Nielsen Station Index, are based on samples of 
either housing units or households. Surveys for the packaged goods industry, including Nielsen 
Food Index, Nielsen Drug Index, and Nielsen Scantrack United States (NSUS), are based on 
samples of stores. The Single Source service, which ties together consumer purchasing behavior 
with household television viewing and retail marketing support, is based on both household 
and store samples. Although sample maintenance is an important issue to each of these surveys, 
the present discussion will focus on our Scantrack sample of grocery supermarkets, which is 
the basis for the NSUS service. The Scantrack sample includes 3,000 supermarkets, stratified 
by 50 metropolitan markets and a remaining United States stratum. Within a market, the sample 
is further stratified by major chain organizations. The frame is ordered geographically, and 
a systematic sample is selection within each stratum to achieve proper socio-economic represen- 
tation. This sample is also representative of store age, store size, and other factors associated 
with item sales. Although a geographically ordered systematic sample is exceedingly simple 
and straightforward, the choice of this sample design is justified based on years of experience, 
as well as the results of empirical studies in which various sample designs were tested on universe 
data. 

Stores in the Scantrack sample are equipped with electronic scanners at the checkout, which 
read bar codes on packaged goods. Bar codes are called universal product codes or UPC’s. 
When the item is scanned, the transaction is entered into the store’s computer where the UPC 
is matched with the item’s price. Each week, the sample stores provide us with total sales move- 
ment and price data for every item that is scanned in the store. Since a supermarket typically 
carries over 10,000 UPC’s, we receive and process over 30 million observations per week. 

In addition to scanner data, we obtain data on promotion conditions for the items in each 
of the sample stores, including whether an item was featured in a newspaper advertisement, 
store display, or store coupon. If an item was featured, we also know the type of newspaper 
advertisement used and the location of the display within the store. 

NSUS reports include estimated sales totals for individual items and aggregates of items 
for each market and the total United States. A ratio estimator is used, with all-commodity 
volume as the auxiliary variable. All-commodity volume, or ACV, refers to total sales of all 
items in a store, usually on an annual basis. ACV tends to be highly correlated with sales of 
individual items. In addition, the NSUS reports include estimates of sales and sales rates by 
promotion condition and estimates of year-to-year sales trends. 

Continuous maintenance is necessary for the Scantrack sample because the national super- 
market universe of approximately 30,500 stores is not static. In a recent 12-month period, 
approximately 2,200 new supermarkets opened, and 2,450 existing stores went out of business. 
Another 170 stores were reclassified during the year. Reclassification can result from any of 
a number of changes. Some smaller grocery stores enter the Scantrack universe when their 
ACV’s surpass the $2-million-per-year threshold which defines a supermarket. A store might 
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change name or location, or be expanded through remodeling. Some stores change to an 
extended or economy format, such as a superstore, warehouse store, or other nontraditional 
supermarket. In 1979, about 3,800 extended and economy stores accounted for 17% of total 
supermarket sales. By 1988, the number of extended and economy stores had grown to over 
9,000, and they accounted for almost 50% of all supermarket sales (Progressive Grocer 1989). 
Sometimes, individual stores or entire chains are acquired by another organization affecting 
stratum definitions. 

In addition to universe changes, missing or faulty data situations arise that require substitu- 
tion of sample stores. Some selected sample stores do not scan, and some that do have incom- 
patible scanning equipment. If a store is consistently unable to provide us with usable data, 
it must be dropped from the sample. Sometimes a request for a sample change within an 
organization comes from the chain itself. Occasionally, a retailer simply refuses to cooperate. 


The principal objectives of our maintenance system for the Scantrack sample are: 


(1) the sample should maintain geographic balance through time 

(2) the system should maintain the sample size through time 

(3) the sample should adhere to principles of probability sampling so as to avoid bias in 
estimators of total sales, and 

(4) sample changes should not disturb excessively estimates of year-to-year trends. 


Geographic balance is a proxy for socio-economic balance. Because different neighborhoods 
have different purchasing patterns, geographical balance is important to achieving an efficient 
sample design (/.e., low sampling variability) over a wide range of products. Furthermore, 
geographic balance is an important factor in our customers’ perception of an appropriate sample. 

A sample size decrease would adversely affect the standard errors of the estimators, and 
a sample size increase would adversely affect our costs. Neither outcome is desirable. Further- 
more, contracts with chain organizations specify sample sizes and cooperation payments, and 
any changes would have to be renegotiated. This too is undesirable. 

All applications involving Scantrack data require efficient, unbiased estimators of total sales. 
Manufacturers and retailers need such data for everyday business decisions, such as how much to 
produce, how much to ship, how much to keep in inventory, and how to allocate store shelf space. 

Clients also require reliable year-to-year trend information for managing their businesses. 
Trend estimates help manufacturers assess the overall health of their businesses. Both manufac- 
turers and retailers benefit from knowing the longer-term performance of all major brands 
in all product categories. 

We describe the maintenance system that has been developed to meet these objectives in 
section 4. But first, we describe a new geographic ordering scheme in section 3. 


3. PEANO KEYS 


The Peano key is a parameter that defines a certain fractal, space-filling curve. It provides 
a mapping from ®? to ®! such that points in R? or spatial objects can be arranged in a unique 
order (Peano order) on a list. In the application we have in mind, the spatial objects are sampling 
units, and the space ®? is represented by earth’s geographic coordinate system. 

We obtain the Peano key by interleaving bits. See Peano (1908), Laurini (1987) and Saalfeld, 
Fifield, Broome and Meixler (1988). Let X¥ = X,...X3X,X,and Y = Y,... ¥; YY; repre- 
sent the longitude and latitude of an arbitrary point in k-digit binary form. Then, the corre- 
sponding Peano key is P = X,Y,...X3Y3;X7Y,X ,Y,. Also see figure 1 for an example for 
the case K = 4. Note how simple it is to calculate the value of P. 
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LATITUDE LONGITUDE 


Figure 1. Creating the Peano Key by Bit Interleaving 


Given k-digit (for any finite k) latitude longitude coordinates, the spacial ‘‘point’’ represented 
by the value of P is actually a square in R2. As k increases, the sizes of the squares decrease. 
In fact, as k tends to infinity, the value of P will tend to represent a specific point in R2. 

The space-filling curve created by the values of the Peano key, P, is in the shape of a recur- 
sive N. Figure 2 illustrates the N-curve, using a grid of 1024 points. This figure displays the 
self-similarity feature of fractal images. 

The N-curve passes once and only once through each point in space, points being defined 
as squares whose size is determined by the number of digits carried in the latitude and longitude 
coordinates. The order of points on the curve (Peano order) is largely preserving of geographic 
contiguity. Thus, Peano order facilitates proximity searches. Peano order involves a few 
geographic discontinuities, such as the jump from point 516 to point 517 in figure 2, as does 
any mapping from R? to @!. 

In the specific application we envision here, economic establishments are arranged ona list 
in Peano order by means of their latitude and longitude coordinates. Probability samples of 
the establishments may be drawn systematically from the ordered list. Because the earth’s 
coordinate system is stable, there is no ambiguity in determining the list position of new 
establishments. Thus, they may be subjected to sampling too. 

To illustrate this application, see figure 3 which displays a chain of retail establishments 
in the United States. Each establishment is described by a double-letter code. This code in 
natural lexicographic order signifies the Peano order of the establishments. 

In the next section, we describe a sample maintenance system that is based upon the 
establishments’ Peano order. 
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Figure 2. Peano Order Based on 1024 Points 


Figure 3. Chain of Retail Establishments in Peano Order 
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4. RULES FOR MAINTAINING THE SAMPLE 


We describe a system for maintaining samples of retail stores, taking proper account of 
births, deaths, scanning conversions, and other changes in the status of the retail store universe. 
As stated earlier, we developed the system for applications at the A.C. Nielsen Company. 

We consider a given and arbitrary sampling stratum, say of size N, and assume the universe 
of stores in the stratum is arranged in Peano order. For example, a stratum might include all 
stores in a given metropolitan market, such as Vancouver or Montreal. Ordering by Peano 
key values will turn out to be especially well-suited to the maintenance system that follows. 
Other ordering schemes may be considered for this work so long as they are stable across time 
and effectively map ®* to ®! in such fashion as to preserve geographic contiguity and to 
assign all birth stores a unique position in the ordering. 

We assume an original sample is selected systematically with equal probability from the 
ordered list of stores at time ¢t = 0. Let U;; denote the j-th store in the i-th possible systematic 
sample, fori = 1, ..., Kandj = 1, ..., n;, where k is the sampling interval and 7; is the size 
of the i-th possible sample. If N = nk + r, r < k, thenr samples will be of size n=n+i1 
and k — r samples of size n; = n. In what follows, we shall also use the subscript ‘‘i’’ to 
represent the sample actually selected. 

Let P;; denote the Peano key value associated with U;;. Let P; and Py denote the smallest 
and largest possible Peano key values within the market under study. Thus, 


Pos Pi Pye ei big ee eee 


Note that we are assuming each store possesses a unique geographic location and thus a 
unique Peano key value. 

Let Y,;; denote the value of some characteristic of Uj; at time ¢. A standard unbiased 
estimator of the population total, Y,, is 


nj 
Y= k oe Stij> 
J=1 


while the ratio estimator is given by 
Yrri = Yi X,/X1i, 


where the X-variable is a measure of size and X, and X;; are analogous to Y, and 1 eee 
respectively. 

Define N Peano key segments, S;;, by partitioning the range [P;, Py] at the Nstore values 
Pj. We let Sj; = (Pj, P:41,;), where it will be understood that Py+1,; Tepresents P, j41. A 
special definition is needed for the final segment. We define S,,, p= lPnpPul VU CPs Pw) 
so that the entire Peano range [P;, Py] is covered by the N segments. This special definition, 
which treats the Peano range as if it were on a circle, is needed later to guarantee that all store 
births are given a nonzero probability of selection. Alternative segmentation schemes may be 
used without defeating the statistical properties of the maintenance system. 

Our maintenance scheme is based upon the Peano key segments. The basic idea is to view 
the systematic selection process as applying to the segments, with subsampling of stores within 
the selected segments. Thus, as a formal matter, the segment is the primary sampling unit (PSU), 
not the store. Of course, as of the time of initial sample selection, there is, by construction, 
only one store per segment. 
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4.1 Birth Sampling 


At a future point in time, say t’, one or more new stores may open for business. Each new 
store will be assigned its unique Peano key value, and this value will be an element of one and 
only one Peano key segment. The Peano key permits us to automatically place new stores in 
their correct and unique positions one the ordered universe list. 

The simplest possible rule for sampling births is the following: 


Rule 1. A birth store is selected into the sample if and only if its Peano key value 
is an element of a selected Peano key segment. Birth stores whose Peano key values 
are elements of nonselected segments are themselves not selected. 


Given this rule, a birth store is selected with probability 1/k. This occurs because its segment, 
which is unique, is selected with probability 1/k. Unfortunately, Rule 1 does not provide good 
control of the sample size over time. 

To control the sample size, we advocate some form of subsampling within PSU’s. Let 
oe Oy Geter OAT Bi denote the stores in segment S;;. The original store is now labeled U;;,, 
whereas Ujj2, Ujj3, ..., Ujjg,, are the birth stores in Peano order. The number, B,; — 1, of 
births in any given segment will be 0, 1,or 2 in most applications. Then we may subsample as 
described in the following alternative rule. 


Rule 1A. A birth store will be subjected to subsampling if and only if its Peano 
key value is an element of a selected Peano key segment. Associate with Uj;,, Ujj2, 
PY, Vijay the probabilities p;;;, Djj2, .--; PijBjj> where pj, > Oand Y pj, = 1. 
Now choose one of the stores according to this probability measure. Subsampling 
is independant from one selected segment to the next. Birth stores whose Peano 
key values are elements of nonselected segments are themselves not selected. 


The probabilities in Rule 1A may be equal or unequal. If unequal, they may be defined in 
proportion to some preliminary measures of size, or defined so as to accelerate or retard the 
replacement of the sample. 

We observe that our principal maintenance objectives are well-satisfied by Rule 1A. First, 
the rule maintains geographic balance over time because there is always one unit selected from 
each of the originally selected segments, which themselves were geographically balanced by 
virtue of the systematic sampling design. Second, the rule maintains a constant sample size 
over time because there is always one and only one store selected from each of the originally 
selected segments. Third, the rule is in accord with strict principles of probability sampling, 
whereby probabilities of inclusion are known and nonzero, and thus unbiased estimators of 
population totals are available. Finally, by appropriate choice of the pj;;,, we may control 
distortion in year-to-year trends. 

The unconditional probabilities of selection are given by 


ys 
Tijb = k Pijb 


for b = 1, ..., Bj. That is, 7;;, is equal to the probability of selecting the PSU times the 
conditional probability of selecting the store, given the selected PSU. 

Let Y;-;;, denote the value of the unit Uj;;,, and let Y,,;;, denote the total for the (i,/)-th 
PSU. Then, the unbiased estimator of the population total Y,, is given by 


a 


Yj; = i Vrrijb/Tijb> 
j=l 
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where y;/j;p is the value of the single unit selected from the (i,/)-th selected segment, with 
variance 
1 & nj > ko ON; 
4, oe an — 7 2 a 
varied = — («E Yue We) + kD Lote () 


where 


Bij Y...; 2 
Oh ij = Se Pijb ( det Yue) 
pla Pijb 

The first term on the right side of (1) is the variance due to the sampling of segments. This 
is the original variance in the sense that it is the variance expression that applied at the time 
of original sample selection. The second term on the right side is the variance due to subsampling 
within segments. Note that o?,; ; vanishes for any segment in which birth subsampling has not 
occurred. Note also that the subsampling scheme achieves its minimum variance when, for 
each given i and /, the probabilities p, jp are defined to be proportional to Y,- ijb- In this case, 
the within component of variance vanishes. For any real application, however, this propor- 
tionality condition will be satisfied only approximately. 

As usual, a first-order Taylor series approximation may be used to discover the variance 
of the ratio estimator. See Wolter (1986) for appropriate techniques to estimate the variance 
of both the unbiased estimator, Y,.;, and the ratio estimator Vea 

As time passes, it will be necessary to periodically update the sample to reflect additional 
births and other changes in the universe. It may be desirable to schedule the updating at regular 
intervals of time, so as to facilitate management of the work. We will refer to these intervals 
as update cycles. Such cycles may occur monthly, bimonthly, quarterly, or at whatever interval 
makes sense in a particular application. Factors to consider in establishing the frequency of 
the updating cycles include cost of the updating process; desired accuracy of the estimators 
of level and trend; and perceptions of the customers or users of the data. 

Generally speaking, more frequent updating will cost more, achieve greater accuracy, and 
be perceived better by customers than less frequent updating. 

For an update cycle at any future time ft’, Rules 1 or 1A may be used to maintain the sample. 
New stores are always placed automatically in their correct segment, by their Peano key values, 
and the subscript b reflects this order at each cycle. To explicitly reflect these ideas, we should 
have further subscripted the U’s, B’s, p’s, and ’s by time, but we avoided doing so as a nota- 
tional convenience. The expressions for the estimators of total, Y,,, and Yp,;, and their 
variances remain valid for each t’. 


4.2 Updating for Deaths 


Rules for maintaining a sample over time must obey an important general principle. They 
must treat equally both selected and nonselected units. In the case of deaths, this principle 
implies that all deaths, both those in and out of the sample, must be handled in the same fashion 
in any sample updating process. If this principle is not followed, the resulting estimators will 
be biased, and the bias may accumulate over time. 

In what follows, we describe procedures for death updating that follow this essential 
principle. There are two cases to consider: (i) deaths are not known on a universe basis, 
(ii) deaths are known on a universe basis. 
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For case (i), we suggest Rule 2. 


Rule 2. All deaths in the sample will be known. They should remain in the sample 
but be set to zero (i.e., y = 0) at the time of an update cycle. 


This rule permits unbiased estimation of the universe population totals. Deaths cause the 
estimator variances to increase, and estimators of variance will properly reflect this increase, 
provided the deaths are retained in the sample with zero values. 


For case (ii), we suggest Rule 3. 


Rule 3. Remove all deaths from the universe at the time of the next update cycle. 
Subject only the remaining live cases to sampling, including births. 


Rule 3 will cause the store count B;; to change in segments where deaths have occurred, 
unless births exactly offset deaths. A replacement store will necessarily be selected within a 
given segment whenever the sample store from the segment has died -- except when there is 
a death but no birth and B;; = 0 -- and a replacement store may be selected even when the 
sample store is alive and well. 

In the exceptional case, where B; ; = 0, the sample size drops by 1. An interesting problem 
for future research is to investigate the mean square error of this rule versus that of an alter- 
native rule which selects a replacement store from the same zone of k stores, instead of 
permitting the sample size to drop by 1. This alternative is conditionally unbiased but uncon- 
ditionally biased. 

Two additional issues must be addressed in handling deaths. The first issue concerns the 
coordination of birth and death updating. Store births and deaths will occur naturally at 
irregular intervals, depending upon business conditions and population growth. In some time 
periods, neither births nor deaths will occur. In other time periods, births may occur but not 
deaths, or vice versa. While in other periods, both deaths and births will occur. In theory, it 
would be possible to employ different update cycles for store births and deaths. For example, 
one might update bimonthly for both births and deaths, but in alternating months. This 
approach may have advantage in leveling the work load over time. On the other hand, alter- 
nating cycles may tend to defeat the ability of the sample to properly measure trends, creating 
a sawtooth pattern in the store time series as first births are introduced, then deaths dropped, 
then births, deaths, and so on. On balance, we recommend coincident sample updating for 
births and deaths so as to preserve trends. 

The second issue concerns the handling of deaths during the period from their actual 
occurrence until the next update cycle. This issue arises only if the frequency of the updating 
process is less than that of the data-collection process. If the two processes are coincident, then 
there are no new problems. If updating is the less frequent, then there are two alternatives: 


a) drop the deaths from the sample as soon as they are known to us (to be more precise 
statistically, this means the deaths are included in the sample with a value of zero) 


b) continue the deaths in the sample by imputing for them until the time of the next update 
cycle. 


Alternative a) is the simplest, cleanest way of proceeding. Aside from the problem of births, 
it is unbiased and permits correct variance estimators. Because of the birth problem, however, 
this alternative may have a negative effect on the ability of the sample to properly measure 
trends. As deaths occur during the first weeks of an update cycle, one can imagine a slight decline 
in the store time series, not because of fundamental change in economic conditions, but simply 
because the sample reflects deaths and not births. Alternative b) provides a short term fix to 
the problem of properly measuring trends. The essential notion here is that by imputing for 
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deaths, we implicitly make a correction for any births that have occurred since the last update 
cycle. This fix is not particularly elegant, and it is difficult to frame a rigorous, unassailable 
technical justification for it. On the other hand, history has shown that populations of eco- 
nomic establishments tend to be stable in the short run. Deaths are often associated with or 
are compensated by births, with the net size of the population remaining approximately level 
in the short run. The United States Bureau of the Census has used this alternative in its whole- 
sale trade survey, with quarterly update cycles and monthly data collection. See Wolter et al. 
(1976). 


4.3 Chronically Nonusable Stores or Scanning Conversions 


In this final subsection, we present sample maintenance rules for handling stores that are 
chronically nonusable, such as stores that do not scan; do scan but with such poor discipline 
as to render their data faulty and nonusable; or refuse to participate in the survey. We shall 
explicitly discuss nonscanning stores and sample maintenance rules for handling conversions 
from nonscanning to scanning and vice versa, although the material that follows may be seen 
to apply more generally to all conditions of chronic nonusability. We shall let A denote the 
set of scanning stores and B the set of nonscanning stores, where A U B spans the entire 
universe. 

First, we treat conversions to scanning. There are two principal cases to consider: (i) scanning 
status is known for all stores prior to sampling; (ii) scanning status is not known prior to 
sampling, but is observed after sampling for the selected stores only. 


Case (i) is relatively easy to handle. Here is a natural rule: 


Rule 4. Do not subject nonscanning stores B to sampling. Sample only from the 
subuniverse of scanning stores A. As a given nonscanning store converts to scanning, 
then treat it as a birth, subjecting it to birth sampling. Prior to conversion, non- 
scanning stores B shall be represented in the universe by utilizing imputation or other 
missing data techniques. 


Given this rule and the prior data (i.e., scanning status) it assumes, the entire survey budget 
may be allocated to the sample of scanning stores. None of the sample resources need to be 
committed to nonscanning stores. 


To address case (ii), let s denote the selected sample of stores, and lets, = s M A and 
Sp = 5) B. By assumption, s, and sg are not observed until after initial field work is 
completed. Obviously, all of these sets vary with time, but we suppress explicit time subscripts 
to simplify the notation. 

Sample s4 should be maintained by rules presented elsewhere in this paper for births and 
deaths. New rules are required to handle sy. Here is an illustrative rule that treats the stores 
in Sg aS nonrespondents. 


Rule 5. At time ¢, impute for store U;;, € sg the value Vtijo = XtijpVar/Xar, where 
X;ijp is the value of an auxiliary variable for store Uijy, Yaz is the sample s, total 
for the estimation variable, and x4, is the corresponding total for the auxiliary 
variable. Alternatively, imputation may occur by means of substitution, hot 
deck/matching, or other means. Now, act as if the data set is complete, applying 
standard estimators of the survey parameters of interest. At the time U;;, converts 
to scanning, it shall be deleted from sg and joined to s,, and the estimation shall 
still be performed by means of the standard estimators applied to the completed 
data’ set. 
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Given Rule 5, the effective sample size is reduced because of imputation variance associated 
with the 9,;;,. Substitution maintains a larger effective sample size than the other rules, but 
is clearly the most expensive to implement. All rules require limited field work on a continuous 
basis to monitor the scanning status of Oh eSB: 

As an alternative to missing data techniques, we may observe the nonscanning stores using 
an alternative mode of data collection. Depending upon the data to be collected, this could 
involve a store audit or an interview conducted with store personnel by telephone, mail, or 
in person. This alternative would likely be more accurate than the imputation-based methods, 
yet additional cost and time may be involved, as well as burden associated with the manage- 
ment and control of two data collection methodologies. 

Finally, we treat conversions of sample stores from scanning to nonscanning. Such con- 
versions are likely to be relatively small in number and are treated here only for completeness. 
Let Ujjp € S4, I.e., iis a scanning store in the sample. Note that U;;, may be either a store that 
has scanned since being selected into the sample, or a store that converted to scanning after 
originally entering the sample as a nonscanner under Rule 5. 


Rule 6. At the time U;;, converts to nonscanning, it shall be deleted from s,, joined 
to Sg, and subsequently handled by missing data techniques, as in Rule 5. Standard 
formulae shall be applied to the completed data set. To simplify processing and 
field work, the method selected shall be identical to the method selected to handle 
conversions from nonscanning to scanning. 


In the bizarre instance in which a store flip-flops repeatedly between scanning and non- 
scanning, one may handle the store by sequentially applying Rule 5 or 6, as the case may be, 
each time updating the sets 5, and sp. 
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ABSTRACT 


Papers by Scott and Smith (1974) and Scott, Smith, and Jones (1977) suggested the use of signal extraction 
results from time series analysis to improve estimates in repeated surveys, what we call the time series 
approach to estimation in repeated surveys. We review the underlying philosophy of this approach, 
pointing out that it stems from recognition of two sources of variation - time series variation and sampling 
variation - and that the approach can provide a unifying framework for other problems where the two 
sources of variation are present. We obtain some theoretical results for the time series approach regarding 
design consistency of the time series estimators, and uncorrelatedness of the signal and sampling error 
series. We observe that, from a design-based perspective, the time series approach trades some bias for 
a reduction in variance and a reduction in average mean squared error relative to classical survey estimators. 
We briefly discuss modeling to implement the time series approach, and then illustrate the approach by 
applying it to time series of retail sales of eating places and of drinking places from the U.S. Census 
Bureau’s Retail Trade Survey. 


KEY WORDS: Repeated surveys; Time series; Signal extraction; U.S. Retail Trade Survey. 


1. INTRODUCTION 


Papers by Scott and Smith (1974) and Scott, Smith, and Jones (1977), hereafter SSJ, 
suggested the use of signal extraction results from time series analysis to improve estimates 
in repeated surveys. If the covariance structure of the usual survey estimates (Y,) and their 
sampling errors (e,) for a set of time points is known, these results produce the linear func- 
tions of the available Y,’s that have minimum mean squared error as estimators of the popula- 
tion values being estimated (say 0,) for 6, a stochastic time series. To apply these results in 
practice one estimates a time series model for the observed series Y, and estimates the 
covariance structure of e, over time using knowledge of the survey design. 

Section 2 of this paper gives a brief overview of the basic results and framework for the 
time series approach. Section 3 considers some theoretical issues and section 4 some applica- 
tion considerations for the approach. In section 5 we illustrate the approach with an example 
using two time series from the Census Bureau’s Retail Trade Survey. 


2. BASIC IDEAS AND GENERAL CONSIDERATION 
OF THE TIME SERIES APPROACH 


The basic idea in using time series techniques in survey estimation that distinguishes it 
from the classical approach is the recognition of two sources of variability. Classical survey 
estimation deals with the variability due to sampling - having not observed all the units in 
the population. Time series analysis deals with variability arising from the fact that a time series 
is not perfectly predictable (often linearly) from past data. Consider the decomposition: 


! William R. Bell is Principal Researcher, Statistical Research Division, Bureau of the Census, Washington, D.C. 20233, 
U.S.A., and Steven C. Hillmer is Professor, School of Business, University of Kansas, Lawrence, KS 66045, U.S.A. 
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Y, = 0; + &, (2.1) 


where Y, is a survey estimate at time f, 0, is the population quantity of interest at time ¢, and 
e, is the sampling error. The sampling variability of e, is the focus of the classical survey 
sampling approach, which regards the 6,’s as fixed. From a time series perspective all three 
of Y,, 6,,and e, can exhibit time series variation, as long as they are random and not perfectly 
predictable from past data. Standard time series analysis would treat Y, directly and ignore 
the sampling error in the decomposition (2.1), not treating e, explicity, but only indirectly 
in the aggregate Y,. In fact, time series analysts typically behave as if the sampling variation 
is not present and the true values are actually observed. The most basic thing to keep in 
mind about the use of time series techniques in survey estimation is that there are two distinct 
sources of stochastic variation present that are conceptualized, modeled, and estimated 
differently. 


2.1 Signal Extraction Results 


Suppose that survey estimates Y, are available at a set of time points labelled ¢ = 1, ..., T. 
LetY = (Yj, ..., Y,)’ and similarly define 6 ande so we have Y = 9 + e. Assuming the 
estimates Y, are unbiased and 0, and e, are uncorrelated (see section 3.2) 


E(Y) = E(9) 


B= (M1, ---> er)’ 
Yy = Yo + Le, (2.2) 


where F denotes expectation over both the sampling and time series model distributions, and 
Ly is the covariance matrix of Y, etc. Herey and Y, refer to the time series structure of 0,, 
which is not subject to sampling variation. If Y,, 6,, and e, do not require differencing, it is 
well known that, since Cov(9,Y) = Yo, using (2.2) the minimum mean squared error linear 
predictor of @ can be written 


6 =n + YolylY —-p) (2.3) 
=u + (I- Leby'\Y¥ -w) (2.4) 
=aetT Cet De Ler) a exw) (2.5) 


Another standard result is that the variance of the error of this estimate is 
Var(O-= 0) =p hy Le = ee (2.6) 


If normality of (6,Y) is assumed (2.3) - (2.5) give E(@ | Y), the conditional expectation of 
6 given Y, and (2.6) gives Var (6 | Y), the conditional variance. 

If Y, requires differencing the preceeding results need to be modified. Assume e; does not 
require differencing, but 6, and Y, need to be differenced once (i.e. by applying 1 — B where 
BY, = Y;_,). Let the differenced data be W, = (1 — B)Y, = (1 — B)0, + (1 — Boe, for 
t=2,..., 7. Let A = [Aj] be the (T — 1) x T differencing matrix with A;; = —1, 
Aji+1 = 1, and all other elements zero, and write AY = W = AS + Ae. Then we use 
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~ 


Doe i eA uy ALY — 40), (2.7) 
Van = eA AY (2.8) 


The expressions (2.7) and (2.8) also apply when 6, and Y, require a more general differencing 
operator (e.g. seasonal differencing), with appropriate definition of the differencing matrix 
A, as long as e; does not require differencing. These results are analogous to (2.4) and (2.6), 
but with A’ Y 7'A playing the role of Ly!. The results are given in Bell and Hillmer (1990), 
where their optimality properties are discussed. They were essentially given by Jones (1980), 
but without real justification. 

Scott and Smith (1974) and SSJ used classical signal extraction results equivalent to (2.3) - 
(2.6) based on covariance generating functions rather than covariance matrices. Bell (1984) 
considers such results for models involving differencing. Another approach (Binder and Dick 
1989, Bell and Hillmer 1989) involves putting time series models for 6, and e, in state space 
form and using the Kalman filter and smoother, which can be viewed as an efficient way to 
compute the matrix results given above. Also, see Tam (1987) for use of the Kalman filter in 
an explicitly model-based approach to analysis in repeated surveys. In subsequent discussions 
we generally refer to the results (2.3) - (2.6), though our remarks easily extend to cover the 
use of (2.7) - (2.8). 

In many cases, for time series Y, and 0, that are always positive, we will want to take 
logarithms of Y, to help induce stationarity of 6, and the sampling errors. In such cases we 
rewrite (2.1) as 


Y, = 6,(1 + u,) = O,u;,, (2.9) 
where u, = e,/0,and u, = 1 + u,. Taking logs we get 
log( Y;) = log(@,) + log(1 + a,) = log(@,) + log(u,;). (2.10) 


Letting y and L, now refer to log(@) = (log(6,), ..., log(67))’, and Ly = Ly + L,, refer 
to log(Y), analogous to (2.4) our estimate is 


log) =u + [I — L,Ly'|dog(Y) — w). (2.11) 


The analogues to (2.6) - (2.8) are obvious. To estimate 6, we use exp [log(6, )]; alternatively, 
one could use exp[log(6,) + Var(log(6,) — log(@,))/2] for a more “‘unbiased”’ estimate of 
6, with minimum mean squared error (see Granger and Newbold 1976). 

Notice that (2.3) - (2.6) require knowledge of » and any two of Ly, Ly, and », (the third 
can be obtained from (2.2)). In practice these will not be known exactly and will need to be 
estimated. Thus, the true minimum mean squared error linear predictor 6 cannot be obtained 
exactly and (2.6) or (2.8) understates the mean squared error (MSE) since it does not account 
for modeling errors. (See Binder and Dick (1989) and Eltinge and Fuller (1989).) The basic 
assumption underlying the application of the preceeding results, which we shall call the time 
series approach to survey estimation, is that y and )y can be well-estimated from the time 
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series data on Y, through a time series model, and ), can be well-estimated using survey 
microdata and knowledge of the survey design (possibly also using a model). We discuss these 
issues further in section 4 and illustrate the approach with the example of section 5. 


2.2 Some General Considerations of the Time Series Approach 


Smith (1978), Jones (1980), and Binder and Dick (1986) review and discuss the approach 
known as Minimum Variance Linear Unbiased Estimation (MVLU). While both the MVLU 
and time series approaches can use data from time points other than ¢ in estimating 0,, they 
differ in that MVLU regards the 6,’s as fixed and still only treats one source of variation, that 
due to sampling. MVLU was developed for cases (such as many rotating panel surveys) where 
more than one direct estimate of 6, is available for each ¢ and the e,’s are correlated over time 
due to overlap in the survey design. The use of Y; for 7 # ¢ in estimating 6, then comes from 
generalized least squares results and the correlation of the e,’s. We can see the distinction in 
terms of our results for the simple case (2.1) where only one direct estimate, Y,, of 6, is avail- 
able, by letting Var(6,) — © to get the MVLU. Then ),_! — Oand (2.5) becomes § = Y, so 
without multiple estimates of 6, the MVLU just uses Y, to estimate 6,. These remarks apply 
generally to composite estimation (Rao and Graham 1964, Wolter 1979), which is often used 
as an approximation to MVLU. 

One question that may arise regarding the time series approach is why one should consider 
6, a stochastic time series? This issue has been discussed by SSJ and at length by Smith (1978). 
They observe that (1) users of data from repeated surveys treat the data Y, as a stochastic time 
series in modeling and would do the same with 0, if it were available (as it essentially is for 
surveys with very low levels of error), and (2) classical results (e.g. Patterson 1950) for estima- 
tion in repeated surveys (MVLU) assume a time series structure for the individual units in the 
population, while maintaining the anomalous position that 0,, which is a function of these 
individual units (such as the total), is a sequence of fixed, unrelated quantities. In fact, if we 
assume 6; is a sequence of fixed, unrelated quantities, then data through any time point are 
irrelevant to the future behavior of the true series 0,. If this were the case, then there would 
be little point in doing the survey in the first place. The data would be out of date as soon as 
they were published. The real questions here are whether or not we can estimate the time series 
structure of 6, and e, well enough to make beneficial use of this in survey estimation, how 
worthwhile these benefits may be, and what risks are involved in doing so? 

Along with opportunities for improving estimation in repeated surveys, the time series 
approach offers potential for improved results in other problems where typically only one of 
the two sources of variability is recognized. It also can potentially unify these as subproblems 
under one general approach. Such problems include preliminary estimation in repeated surveys 
(Rao, Srinath, and Quenneville 1989); seasonal adjustment (Wolter and Monsour 1981, 
Hausman and Watson 1985, Pfeffermann 1991); time series trend estimation and the related 
problem of detection of statistically significant change over time (Smith 1978); benchmarking, 
the reconciling of results from a repeated survey with the results from another survey or 
census estimating the same population characteristics (Hillmer and Trabelsi 1987, Trabelsi and 
Hillmer 1990); and inference about time series properties of the true series 6, relevant to 
economic models (Bell and Wilcox 1990). 

Finally, we note that the decomposition (2.1) or (2.10) does not allow for nonsampling errors, 
nor does the time series approach treat them explicitly. Whether nonsampling error is gener- 
ally more or less of a problem for the time series approach than for the classical approach is 
unclear, but one may wish to consider the possible effects of known or suspected nonsampling 
errors on the time series estimators when applying them in particular situations. 
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3. THEORETICAL CONSIDERATIONS 


We now obtain some theoretical results relevant to the time series approach, and some 
properties of the resulting estimators. 


3.1 Consistency of Time Series Estimators 


Following Fuller and Isaki (1981) we let Y; (from the £4 sample at time t) be a sequence 
of estimators of the characteristic 6f of the ¢*4 population at time ¢ where the populations and 
samples for? = 1, 2, ... are nested. (See their paper for details.) Define ¥‘, 6,e',u°, LS, LG, 

£, 6°, and 6! in the obvious fashion. We consider what happens to the time series estimators 
6° when the estimators Y’ are consistent, i.e. Y' — 6! in some fashion as ? — o fort = 1, 
, T, with T, the length of the series, remaining fixed. For now we assume py’, > 4, and Lb! 
are known for each f, which generally means the time series models (including their parameter 
values) for the components are known. Since y’ and ¥ fare really superpopulation parameters 
for the time I, 6/, we wish to estimate, we shall assume these are the same for each popula- 
tion ?, that is, n° = and Yf = Y, (a positive definite a for all 2. This is also partly for 
convenience since we could get the same results assuming py’ — y and Yj — Yyas l— o. 

From (2.5) it would appear that Y’ — 6’ would imply 6’ — 6° as long as Y£ — 0. This 
condition suggests we need mean square convergence of Y;to 6/. We thus consider estimators 
Y; of 6/ such that E[ (Y¥! — 6f)*] = E[(e')?] — Oas? — o. Since E[ (e5)?] = Var(e’) + 
[E(e!)]* this implies both Var(e{) — Oand E(e’) — 0. Assuming Y$ — 6fin mean square 
fort = 1, ..., T thus implies Y{ — 0. We can now establish 
Result 3.1: Consider § = (6,, ..., 67)’ given by (2.4). If Y; — 6! in mean square as 0 — oo 
fort = 1, ..., T, then 6¢ — 6! in mean square as? — o fort = 1, ..., T. 


Proof: From Y’ = 6° + e’ with Y! — Owe have L} — Y, (evenif 9’ ande’ are correlated.) 
From (2.4) we have 


gr — pt = (Y= 9f) — Vi(ty "= w). (3.1) 
The first term on the right converges toQ in mean square; the second has meanQ and variance 
Ye (LP) 1 LE — Oasl — o. Since both terms converge to0 in mean square so does 6’ — 9°. 


Convergence in probability is a more familiar concept in survey sampling. If Y/ — 6! as 
£ — o in probability fort = 1, ..., T this does not guarantee ©! — 0, which is mean square 
convergence, a stronger condition. If we assume there are random variables ¢, with finite 
variance such that | ef | < ¢, (almost surely) uniformly in ?, then Y! — 6! in probability 
implies Yf — 6! in mean square (Chung 1968, p. 64). Therefore, using Result 3.1, we have 


Result 3.2: If Y{ — 6! in probability as 2 — o for t = 1, ..., J and there exist random 
variables ¢, with finite variance such that | Y/ — 6! | < ¢ (almost surely) uniformly in 2, then 
6 — 6‘ in probability as 2 — o fort = 1,..., T. 

These consistency results show that if the errors in the original estimates Y, of 0, are 
small (¥ is small) then the errors 6, — 6, will be small as well. From (3.1) we see this is 
because 6 — Y becomes small as ), becomes small, thus when there is little error in the 
original estimates Y, the time series approach will not change them much. Binder and Dick 
(1986) have noted this phenomenon, and also pointed out that in this case it does not matter 
what time series model is used. That is, the convergence to 0 of (3.1) depends only on ¥{ — 0 
and not on yw or Yo. Het the SOs AS) results extend to allowing yp, Le, and also ve! to 
be replaced by estimates i’, ve and Bs Aen will generally come from estimated models - 
see sections 4 and 5), as long as i’ and yf 9 converge to something as ? — oo (it doesn’t matter 
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what as long as the limit of yd 4 is positive definite) and ye — 0, which should generally hold 
when ){ — 0. It is also obvious that these results extend to the nonstationary case where 6 
is given by (2.7) instead of (2.4). While the results show that the time series estimates behave 
sensibly in the situation of small error in the original estimates Y,, the gains from the time 
series approach will come in the opposite case - when Var (e;) is large. 

We can extend the consistency results to the case where we take logarithms and estimate 
log(6,) using (2.11). In this case let Y{ = Var(log(u’)) where log(u’) = (log(u{), ..., 
log(u‘,))’. If we are taking logarithms it is reasonable to assume Y; and 6! remain bounded 
away from 0, say | Y} | = «and | 6! | = « (almost surely) for all ¢ and for some constant 
K 20: 


Result 3.3: If Y; — 6; in mean square as? —~ o fort = 1, ..., T, then log(Y!) — log(6) 
and log(6!) — log(6!) in mean square as? —~ o fort = 1, ..., T. 


Proof: The analogue to (3.1) is 
he ty e e e7p ey -1 £ 
log(9") — log(9") = (log) — log(@")) — Li(Ly)~‘(og(¥’) — x). 


If we can show ¥! — 0 we will have the result since this implies log(Y¥’) — log(@‘) in mean 
square, and the second term on the right behaves exactly as that in (3.1). Notice 


E|(#:)*] = E[(er)?/(6:)"] s (E(e1)*)/x> +0 as &— o, 


thus E[(a;)7] = E{ (uf — 1)?] — 0. This implies Var(uf) — 0 and E(u!) — 1. By 
Jensen’s inequality (Chung 1968, p. 45), since exp(-) is a convex function, 


1< exp(E|log(u;)7]) < E(exp|log(u;)*]) = E[(u;)’]. 


But E[(u;)*] = Var(ui) + [E(ul)]? + 1 so exp(E[log(u')?]) — 1 implying E[log(u')2] — 0. 
This yields Var(log(u})) — 0 as desired. 


As before we could get a convergence in probability result by imposing a boundedness 
condition on the log(u!). Having log(6,) as an estimate of log(@,), we have the following 
Corollary to Result 3.3 for using exp [log(@,)] as an estimate of 0,. 


Corollary 3.4: If Y; — 6/ in mean square as ¢ - o for t = 1, ..., T, then (see (2.11)) 
exp[log(#!)] — 6! in probability as @ — o fort = 1 BuO TE: 


Proof: Since log(6 ) — log(6‘) in mean square implies convergence in probability, the result 
follows since exp(-) is a continuous function (Chung 1968, p. 66). 


An analogous result obviously holds for using exp[log(6!) + Var(log(6 ) — log(6!))/2] to 
estimate 6,, since then Var(log(6/) — log(@!)) ~ Oas? — o. 


3.2 Uncorrelatedness of 6 and e 


Standard time series signal extraction results corresponding to (2.3) - (2.8) typically assume 
and 6, and e; are uncorrelated with each other at all leads and lags (equivalent to independence 
under normality). Previous papers on the time series approach to repeated survey estimation 
have merely assumed this, but since 6, and e, depend on the same population units it is not obvious 
that this assumption is valid. Fortunately, we can establish that it is valid under fairly general 
conditions. (Tam (1987) discusses how this fails under an explicitly model-based approach.) 
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We let y;, be the value of the characteristic of interest for the ith unit in the population at 
time ¢, and let Q, = (yj: i = 1, ..., N;} be the collection of all N, of these units. We con- 
sider time points t = 1, ..., Tand let Q = (Q,, ..., 27)’. The y;, are random variables, 
as is 0, = 0,(Q,), which is a function of the y;,. The sample at time f, s, (denoting the indices, 
not the values, of the units selected), has probability of selection p(s, |Q). The estimator Y, 
of 6, is a function of the values y;, for the units sampled, thus a function of both Q, and sy, i.e. 
Y, = Y,(Q,,5,;). We could let Y, depend on the sample at times other than ¢, but we ignore 
that here for simplicity. 

We consider estimators Y, of 6, that are design unbiased, which we shall define as 
E(Y, |Q) = eee p(s; |Q) = 6;. We could alternatively define design unbiasedness as 
E(Y,|Q,) = Bey, P(s; | Q,) = 6;, and then would need to assume the sample selection 
_ process is such that p(s, |Q@) = p(s; | 0,),so E(Y, |Q) = E(Y, | Q,). If the sample design 
is noninformative then s, andQ are independent, implying p(s, |Q) = p(s, | 2,) = p(s;), 
and either definition of design unbiasedness reduces to ) s,%+P(S;) = 9;. This is the usual 
definition, which generally assumes the y;,, and so Q, and 6,, are fixed. (The assumption 
P(s; |Q) = p(s; | Q,) allows the sample selection process at time ¢t (p(s, | Q)) to depend on 
the population values at time ¢ (Q,), but assumes the population values at time points other 
than ¢ (Q; fory # t) offer no additional information on s, beyond that in Q,. This might occur 
if sampling was with probability proportional to the size of an auxiliary variable at time ¢ that 
was correlated with the y;, only at time ¢.) The assumptions we make here might even be 
generalized. 


Result 3.5: If Y,is design unbiased for all ¢ then 0, and e, are uncorrelated time series. 


Proof: Consider Cov(@,, e;) for any two time points ¢ and j. Since Y; is design unbiased 
E(e; |) = E(Y; — 4; |) = 0, implying E[E(e; |Q)] = E(e;) = 0. Also E(6, - ¢|Q) = 
6, - E(e; |Q) = 0 implying E(6, - e;) = 0. Thus Cov(6,,e;) = E(6, - e;) — E(0,)E(e;) = 0. 


Comment: If E(e; | 2) does not depend on Q then e; is said to be ‘‘mean independent”’ of Q, 
which is known to be a stronger condition than e; and Q uncorrelated, though not as strong 
as stochastic independence (unless we have normality). This shows that actually we only need 
E(e,|Q) = E(Y,|Q) — @,to not depend on Q for 0, and e,; to be uncorrelated time series. 
This would cover cases where Y, has a constant additive bias (not dependent on Q,) as an 
estimate of 6,, or, using approximate Result 3.6 which follows, a constant percentage (multi- 
plicative) bias. 

We now consider the logarithmic decomposition (2.10) when the Y, are design unbiased. 
We assume that a; is O,(7;) where r; ~ 0.as 2 — o in the superpopulation framework of the 
previous section, omitting the superscript ? from random variables here for convenience. (See 
Wolter (1985, p. 222) for definition of the order in probability notation O,(7;). For example, 
when estimating a population mean we would often have Var(u;) =< K/nj, where K is some 
constant and nj;,is the sample size at time / in the ¢ th population. Then a; = O,(n ie ) from 
Wolter (1985, theorem 6.2.1).) From a Taylor series linearization of log(u;) = log(1 + 4) 
we have from Wolter (1985, theorem 6.2.2) 


log(u;) = &; + O,(77). (3.2) 
Using this we obtain the following. 


Result 3.6: If Y, is design unbiased for all ¢ and a; is O,(7;), then to terms that are O,(r?), 
log(@,) and log(u,) are uncorrelated time series. 
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Proof: From theorem 6.2.5 of Wolter (1985) Cov(log(@,), log(u;)) = Cov(log(@,), a) + 
O,(r;). Notice E(u; | Q) = E(e; | Q)/6; = O implies E(a;) = 0, and E(og(6,) a; | Q) = 
log(6, E(u; | 2) = 0 implies E(log(6,) z;) = 0, so Cov(log(6,), u;) = 0, establishing the 
result. 


3.3 Design-Based Properties of Signal Extraction Estimates 


Unconditionally, 6 in (2.3) is unbiased (E( 6 ) = E(@) = w) and has minimum MSE given 
by (2.6). It is easy to see that this is not the case when viewed from a design-based perspective. 
Suppose we begin with design-unbiased estimators Y, i.e. E(Y | Q) = 9. From (2.2) and (2.4) 
we have 6 — 96 = (I —Y, Ly!) e —L.Ly!(0 — yn). With some algebra, we can show the 
design bias, variance, and MSE of 6 are given by 


E(6 |) — 9 = —Y.Ly'(9 — w), 
Vat) ou \Qhn= Deby Les eeede Le leaks 
E[(@ — 9)(@ — 6)’ | 9] = 2. -LeLy' Le 
BiiedYerl Doral Oo gen( Gare) | tea. 8) 


From a design-based perspective we see use of 6 trades bias for a reduction in variance, since 
Y. — War(6 — § | Q)is a positive semidefinite matrix. Whether this reduces the conditional 
MSE (3.3) below Y,, the MSE of Y, depends on the last two terms in (3.3), and in turn on @. 
There can be particular realizations of § for which the conditional MSE of § exceeds Y,, 
though on average signal extraction reduces the MSE by Y, Ly! Y., since the unconditional 
expectation of the bracketed term in (3.3) is zero. (Of course, (3.3) is unusable in practice since 
it depends on @.) Also, as noted earlier, modeling error will contribute additional MSE to 6 ; 
so another fundamental question, more difficult to answer (see Eltinge and Fuller 1989), is 
how the real unconditional MSE of § compares to Ye? 


4. APPLICATION CONSIDERATIONS 


Application of the time series approach to survey estimation requires estimation of the 
autocovariance structure of the sampling errors, estimation of the mean and autocovariance 
structure of the signal, and computation of the estimates 6, and Var(6, — 6,) as discussed in 
section 2. The first two generally involve use of time series models, and are discussed in some 
detail in Bell and Hillmer (1989). Here we make some general remarks. We assume the Y, are 
design unbiased estimators of the 6,. We illustrate application of the methods in the next 
section with two time series from the Census Bureau’s Retail Trade Survey. 

Sampling error autocovariances, Cov (é;,e;4), can be estimated in an analogous fashion 
to sampling variances, Var(e,), which is done routinely and for which many methods are 
available. (See Wolter 1985.) In practice, there may be difficulties in linking survey microdata 
over time to directly estimate sampling error covariances. Nevertheless, in what follows we 
assume we have available such estimates Cov (e;,€;4x) for some set of time points ¢ and lags k. 
Unfortunately, if there is a substantial amount of sampling error present (the situation where 
time series methods can make a difference), such autocovariance estimates are likely to have 
high variances themselves. This suggests some sort of averaging to improve the autocovariance 
estimates. 
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First, if we assume é; is covariance stationary, so Cov (e;,e;,,) = ye(k) depends on k but 
not f, then each Cov (€;,€4,) is estimating Ye(k) and we can simply average ed i.e. take 
Yelk) = (T_— k)~'Y,Cov(e;,¢;4,) if we have we Cov (e244) ford ele — k. Alter- 
natively, Corr(é,,€:44) = = Cov (e,,2:44)/ [ Var (e,) Var (e,4;)] > can be ear over f to 
estimate Corr (e;,e,,,), which also depends on k but not ¢, and the variance estimates can be 
averaged as before. 

Now suppose we are assuming @; is relative covariance stationary, so Cov (e;,/6;,, 4/044) = 

Cov (U;,t;4%) = Yy(k) depends on k but not f. If @,is O,(7;) for all ¢, as in section 3.2, then 
from G: 2) and theorem 6.2.5 of Wolter (1985), Gonouaiy. log (u;+4%)) = Cov(ti;,ti4,%) + 
O rE. ) = vy, (k). Taking Cov (e,,2;44)/( Y, Y;44) as estimates of Cov(d,,i,,,), these can be 
aerigéi over ¢ to estimate 7,,(k). Alternatively, using corollary 5.1.5 of Fuller (1976) we can 
show that Corr(log(u;), log(up+x)) = = Corr (&;,t,44) + Bo (re ), a and ae as estimates 
of fey(k) = Corr (i; 04%); ( Cov (e,,€144)/¥; Yio VoL Var(e,) Var (e;4%)]° VY,Yien) = = 
Corr (e,, €;+%), we can average the estimated autocorrelations of e, over ¢ to estimate p,,(k), 
which are approximately the autocorrelations of log(u,). Relative variance estimates can be 
averaged as before. 

Actually, the usual survey estimates of variances and autocovariances will be estimating 
Var (e, | 2) and Cov(e;,e,,, | Q). These estimates may also be suitable as estimates of 
Var (e,;) and Cov(eé;,e,4,), e.g. if they make sense from a model-based perspective. If not, 
and if Y, is design unbiased so E(e, | 2) = 0, then averaging autocovariance estimates over 
time still makes sense. First, if e, is assumed stationary, then y.(k) = Cov(é;,@4,) = 
E(Cov(e;,é@:4% | 2)], so we can average estimates of Cov(e,,e;4, | Q) to estimate y,(k). 
Or if e; is relative covariance stationary, then since E(u, | Q) = E(e, | Q)/0, = 0, y,(k) = 
Cov(t;,U,4~) = E[Cov(i,,a,,,% | 2)] = Cov(log(u;),log(u;+~)) + Cag and 
estimates of Cov(a,, a, | Q) can be averaged to estimate y,,(k). It is less clear how to justify 
averaging estimates of conditional (on Q) correlations, since E[Corr(e,,e,,, | 2)] # 
Corr (é;,€;,,), though this may be true to a sufficient approximation. In general, approaches 
to estimation of sampling error autocovariance structures bear more investigation. 

Given an estimate of the sampling error covariance structure, and using any relevant 
information about the design of the survey, we can attempt to determine a time series model 
and its parameters to closely reproduce this structure. This is illustrated in the example of 
section 5. 

We now turn to developing a model for the signal, 0,. Since the behavior of most published 
time series Y; is dominated by their signals (otherwise, they would not be published), in 
developing models for signals 6, we can draw on experience modeling time series Y, without 
allowing for sampling error. Such experience suggests use of nonlinear transformations, 
differencing, and regression mean functions in the model for 0, will be important. The loga- 
rithm is the most common nonlinear transformation used in time series, and taking log( Y,) 
lets us model log(9,) through (2.10), with consequences for the sampling error discussed above. 
The following remarks are given in terms of use of (2.1), but apply equally well to use of 
(2.10). While other transformations could be considered, they would not generally yield a 
convenient decomposition of transformed Y, in terms of transformed 6, and some sampling 
error. Choosing between taking logarithms or not transforming seems sufficient for modeling 
Many series. 

Assuming e, has mean zero (implied by design unbiasedness) and does not require 
differencing, 6, and Y, will require the same differencing and have the same mean function. 
The mean function can often be modeled with a linear regression function, yp, = X/8, 
for some vector of regression variables X, and parameters 8. We often use ARIMA 
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(autoregressive-integrated-moving average) models to account for the needed differencing and 
to explain the autocovariance structure of the differenced 6,. A convenient approach to 
developing the 6, model is to first model Y, ignoring the sampling error, and then use a model 
with the same regression terms and ARIMA order for 0,. The parameters of the 6, model can 
then be estimated using the time series data for Y, and the previously developed model for e,, 
holding the parameters in the model for e, fixed. Diagnostic checking may suggest modifica- 
tions to the 6, model. The final fitted model can then be used in the signal extraction estima- 
tion of 6,. The model fitting and signal extraction computations are not trivial; Kalman 
filter/smoother algorithms are discussed in Bell and Hillmer (1989). These have been 
implemented in some software recently developed in cooperation with members of the time 
series staff of the Statistical Research Division of the Census Bureau. This software was used 
in the analysis of the next section. 


5. EXAMPLE: U.S. RETAIL TRADE SURVEY - SALES 
OF EATING AND DRINKING PLACES 


As an illustrative example we analyze time series of sales (in millions of dollars) of Eating 
Places and of Drinking Places, which are estimated in the monthly U.S. Retail Trade Survey. 
The Retail Trade Survey has a list panel of large businesses that are selected into the sample 
with certainty and report sales every month, and 3 rotating list panels of smaller businesses 
that are selected into the sample by stratified simple random sampling. There is also a rotating 
panel area sample covering companies not in the list universe. Quarterly, a sample of new firm 
births is introduced, and firm deaths as determined from activity checks are removed from 
the sample. The rotating panels report current month and previous month sales at intervals 
of 3 months for the list sample and 6 or 12 months for the area sample. Horvitz-Thompson 
(HT) estimates of current and previous months’ sales are constructed; the resulting time series 
shall be denoted Y; and Y;/”,. From these composite estimators are constructed as described 
in Wolter (1979). The final composite estimates will make up our time series Y,. (While it 
might be interesting to instead analyze Y/ and Y/_’, directly, these estimates are not saved for 
a long enough period of time for seasonal time series modeling.) Sampling variances are 
estimated using the random group method (Wolter 1985, chapter 2) for the list sample with 
16 random groups, and the collapsed stratum method for the area sample. Further informa- 
tion on the survey is given in Isaki et a/. (1976), Wolter et al. (1976), Wolter (1979), Garrett, 
Detlefsen and Veum (1987), Bell and Wilcox (1990). 

There are several complicating factors in the survey. The sample is redesigned and 
independently redrawn about every five years, with new samples having been introduced in 
September of 1977, and January of 1982 and 1987. This produces a break in the covariance 
structure of e, every five years, which can be handled by the Kalman filter/smoother as 
discussed in Bell and Hillmer (1989). We shall use data from September, 1977 through 
December, 1986, so there is one redrawing of the sample near the middle of our series. When 
a new sample is introduced approximate MVLU estimates are used for the first three months 
before switching to the composite estimates (Wolter 1979). This introduces a transient effect 
into the sampling error autocorrelations that we shall ignore. Finally, the monthly estimates 
are benchmarked to annual totals estimated from an annual survey and from the economic 
census taken every five years. To avoid this complication we use data that are not benchmarked. 
The reader should be aware, however, that for this reason the data used here do not agree with 
published estimates. 
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Table 1 
Sampling Error Correlations for Horvitz-Thompson Estimates 


Lag 
4 8 12 16 20 24 
Eating Places 
Averaged! 72 7 .79 .63 65 7 i 
From (5.1)? e718 .69 81 .60 53 61 
Drinking Places 
Averaged! .70 .67 .78 .60 .60 61 
From (5.1)? up 66 .80 56 50 59 
Number of Correlations Averaged 23 19 15 11 7 3 
Weights Used in Determining ¢’s 1 1 1 o 0 0 


1 Raw estimates of Corr (e;/ ej ) and Corr(e/7 1:€j— ) were available for all pairs of months from January, 1973 
through March, 1975. Averages of the correlations for the lags shown were taken after applying Fisher’s transfor- 
mation, and the results then transformed back. A ij 

2 Correlations are shown from model (5.1) for m = 4 with parameters $* = .604, $12 = .723 (Eating Places) and 
o* = .580, $42 = .714 (Drinking Places). These parameter values were determined to minimize the weighted sum 
of squared deviations of the correlations from model (5.1) and the averaged correlations using the weights shown. 
ene 20 he 24 were not used (given zero weight) because of the small number of correlation estimates available 
at these lags. 


5.1 Development of Sampling Error Models 


Our first step will be to develop a model for the correlation structure of the sampling errors. 
Let us write Y/ = 0, + e; forthe current month (t) HT estimate, and Y;’,; = 0,_; + ef 
for the previous month (¢ — 1) HT estimate. We shall use the same models for e/ and e;”,. 
Estimates of Corr(e/ ,e/’,) are extremely high - typically .98 or higher. While this is partly 
artificial (due to businesses reporting the same figure for current and previous month sales, 
and possibly due to the way missing values are imputed), in the absence of other information 
it is difficult to distinguish characteristics of e; from those of e/’,. 

Since the three rotating panels in the survey are drawn (approximately) independently 
(Wolter 1979), auto- and cross-correlations for (e; ,e/_’;) should be nonzero only for lags that 
are multiples of 3. Estimates of such lag correlations can be averaged over time assuming 
correlation stationarity. While estimates of lag correlations are not regularly produced for 
the Retail Trade Survey, this was done as part of a special study using micro-data (random 
group totals) from the Retail Trade Survey sample for January, 1973 through March, 1975, 
albeit at a time when the survey had four rotating list panels. Lacking more recent data, we 
‘‘averaged’’ the correlations at lags 4, 8, 12, 16, 20, and 24 for e/ and e;_’;. (This was actually 
done after applying Fisher’s transformation .5 log((1 + 7)/(1 — r)), to make the distri- 
bution of the transformed correlations more symmetric, and then transforming the results 
back.) The results are shown in Table 1. They show fairly strong positive correlation in the 
sampling errors, and evidence of seasonality from the correlations at lag 12. A possible model 
given such data is 


(1 — ¢”B”)(1 — ®B’)e/ = vy, (5.1) 


where m = 4 for the 4-panel survey, with the same model assumed for e/”, with v2 ;-, 
replacing v4;. (vj, and v2 ,_; are white noise with variance o:) 
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A particularly convenient property of (5.1) is that if the sampling error in each panel would 
follow (5.1) with m = 1 if it were observed every month, then for any number m (that is a 
divisor of 12) of independent panels reporting successively, e/ follows (5.1). This allows us 
to use the 4-panel survey results in Table 1 to estimate $4 and %, and (assuming ¢ > 0) con- 
vert these to estimates of ¢? and &, the parameters of the model for the current 3-panel survey. 
This was done by finding $4 and ® to minimize the sum of squared deviations of the correla- 
tions from (5.1) with those of Table 1. (Lags 20 and 24 were dropped, and lag 16 given a weight 
of .5, due to the smaller number of correlation estimates that were averaged together at these 
higher lags.) This resulted in ¢? = .685, @ = .723 for Eating Places, and ¢? = .664, 
@ = .714 for Drinking Places. The resulting correlations form = 4 from (5.1) are shown in 
Table 1, and may be compared to the averaged correlations. More formal statistical estima- 
tion procedures for ¢? and ®, as well as a possible test of model fit, could be considered. (We 
may pursue this later if sampling error autocorrelation estimates can be produced from more 
recent micro-data from the 3-panel survey.) 

We make the further assumption that Corr(e/,e/’;_,) = p Corr(e/,e/_;,) for all k. To 
justify this, note the population regression of e/’,_,; on e/_, is pe;_, + €, where if € is not 
uncorrelated with e;/, at least it is certainly small since Var(€) = (1 — p’)Var (e/) and p is 
very near 1. With this assumption (5.1) leads to the following bivariate model for (e/ ,e/’;): 


1 
(aay 62 (N= OBA) E | = kK | var | | = | a (5.2) 
ef V2,4—1 V2,1—1 pl 


with p = Corr(v,;,v2+;-1) = Corr(e;/,e;’,). Estimates of Corr(e;/,e/’,) are regularly pro- 
duced and were available for 1982 through 1986. Averaging these (with Fisher’s transforma- 
tion) produced 6 = .985 for Eating Places and 6 = .986 for Drinking Places. 

We can now use (5.2) to derive a model for the sampling error of the linear form of the 
composite estimator (Wolter 1979), which is given by 


Yj’’ = (1 — B)Y; + B(Y/27 + Y; — Y/2,) (preliminary estimator), 
(5.3) 


b Pili (1 — a) Ys, 4 a7 27 (final estimator). 


In the (3-panel) retail trade survey, values of a = .8, 8 = .75 are used. It is easily seen that 
(5.3) also holds for the sampling errors, i.e. with Y replaced by e. We can use the resulting 
relations to derive the following equation for e, in terms of e/ and e/’;: 


(1 = .7S5B)e,; = air” = aS er) + 0 Ef" (5.4) 
Using (5.2) and (5.4) we then get 
(1 — .75B)(1 — $°B°)(1 — &B") e, = .2vy — .75 94-1 + 8 v4: (5.5) 


The right hand side is a first order moving average process (Box and Jenkins 1976, p. 121) whose 
parameters can be determined given estimates of o? and p. Thus, (5.5) would yield an ARMA 
model for e,. 

Rather than pursue this further, we shall instead make the rather strong assumption that 
a model of the same form holds for log(u,) in log( Y,) = log(6,) + log(u,), thus 


(1 — .75B)(1 — ¢°B3)(1 — $B") log(u,) = (1 — 7B) ¢. (5.6) 
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Table 2 


Coefficients of Variation (CV)! for Retail Sales Estimates 
2s PRO Oe ee a A ee ee ae eee ed eee 


Horvitz-Thompson Final Composite Signal Extraction? 

CV CV Low High 

Eating Places .042 .025 .017 .023 
Drinking Places .088 .052 .032 .038 


lcy = (Relative Variance)". 

2 The values for the final composite estimator are obtained using models (5.7a,b). 
The values for signal extraction actually vary over time, being highest at the end of the series and lowest near the 
middle. We show the lowest and highest values, which are attained for both series in January 1982 (low) and December 
1986 (high). The signal extraction variances are not symmetric in time because the sample redraw in January 1982 
is not exactly at the center of the series. 


We do this because estimates of sampling variance for these series are highly dependent on 
the level of the series; estimates of relative variance are much more stable over time. We also 
assume we can use estimates of relative variance and of p in determining yn and o2. Estimates 
Nii fier Var (e; ) and Var (e/’ 1) were available for 1982 through 1986. The resulting relative 
variance estimates were used in the spirit of maximum likelihood estimation for the lognormal 
distribution - taking the average of the logs of the relative variance estimates, adding one half 
of the sample variance of the logged estimates to this, and exponentiating the results. (Merely 
averaging the relative variance estimates produced similar results.) This was done separately 
for Rel Var( Y/) and Rel Var( Y;~,), and these two results were then averaged, producing a 
common relative variance estimate that is constant over time. The results are shown in Table 2 
under the heading ‘‘Horvitz-Thompson’’. Using these and the f’s given earlier, one can solve 
for n and o? for the right side of (5.6). The resulting sampling error models are 


(1 — .75B)(1 — .685B?)(1 — .723B!) log(u,) = (1 + .130B)c, (5.7a) 


(Eating Places) 62 = 1.948 x 107° 


(1 — .75B)(1 — .664B7)(1 — .714B'?)log(u,) = (1 + .134B)c, (5.7b) 


(Drinking Places) 62 = 9.301 x 10~°. 


One can use the method of McLeod (1975,1977) to solve for Var(log(u,)) in these models, 
which is an estimate of the relative variance of the final composite estimator. The results are 
shown in Table 2. The corresponding coefficients of variation, .025 for Eating Places and .052 
for Drinking Places, are quite close to estimates published in the Census Bureau’s Monthly 
Retail Trade Reports that are obtained more directly. 


5.2 Time Series Modeling and Signal Extraction 


Figures la,b show plots of the time series of final composite estimates Y, for Eating Places 
and for Drinking Places, respectively. To develop models for 6, we shall begin by modeling 
the Y, series directly. Both series show trends and strong seasonality, with the magnitude 
of the seasonal fluctuations larger the higher the level of the series. This suggests taking 
logarithms and the need for differencing; both are typical for economic time series. Examination 
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Figure l.a_ Retail Sales of Eating Places — Composite Estimates (not benchmarked) 
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Figure 1.b Retail Sales of Drinking Places — Composite Estimates (not benchmarked) 
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of sample autocorrelations for log( Y,) and its differences suggested the difference operator 
(1 — B)(1 — B'*) for both series. Retail trade series are known to contain trading-day 
variation, which can be modeled by including seven regression variables in the model: XY iS 
number of Mondays in month ¢, ..., X7, = number of Sundays in month ¢. Following Bell 
and Hillmer (1983), a more convenient parameterization is obtained by using instead the vari- 
ables 7), = X, — X7,(number of Mondays — number of Sundays), ..., T>, = Xo — X7 
(number of Saturdays — number of Sundays), 77, = ¥ |X; (length of month ¢). To identify 
the ARMA structures, the autocorrelations and partial autocorrelations of the residuals from 
regressions of (1 — B)(1 — B’) log(Y;,) on (1 — B)(1 — B'?)T,, i = 1, ..., 7, were 
examined. This suggested an ARIMA (0,1,2)(0,1,1),. model for Eating Places, and an ARIMA 
(0,1,3)(0,1,1);2 model for Drinking Places. The resulting estimated models were 


(1, — B)(1, — B'?) ioe _ De, air = (1 — .25B — .22B”)(1 — .79B!2) a, 


(Eating Places) 62 = .000230 (5.8a) 


(1 — B)(1 — B’”) ion) mts ai = (1 — .21B — .15B? + .03B?) (1 — .56B!) a, 


(Drinking Places) 6? = .000587. (5.8b) 


For brevity, we omit the estimates of the trading-day parameters. While the lag 2 and lag 3 
moving average parameters in (5.8b) are small, we shall retain them since we shall only use 
(5.8a,b) as starting points for modeling log(@,) for both series. 

Taking models of the form of (5.8a,b) for log(@,) with models (5.7a,b) for log(u;), the 
parameters of the models for log(@,) were estimated. For both series the seasonal moving 
average parameters were estimated to be very near 1(.985 for Eating Places and .992 for 
Drinking Places), implying nearly deterministic seasonality that can be modeled by cancelling 
a (1 — B!?) from both sides of the 6, model and instead including a trend constant and a 
seasonal regression function of the form )}!y;M;,, where M,, is 1 in January, —1 in 
December, and 0 otherwise, ..., Mj;,; is 1 in November, — 1 in December, and 0 otherwise 
(Bell 1987). Estimation of the resulting models produced the following: 


(AaanB) ioe) aes Bese Oy) iM = 00762 + (1 — .20B — .29B?)b, 


(Eating Places) 6% = .000139 (5.9a) 


(1 — B) Eo | OVS t= aye i =" 003524 (1 — .18B =.09B?’=:42B7)b; 
i i 


(Drinking Places) 6% = .000244. (5.9b) 


We again omit the estimates of the regression parameters. We do not provide standard errors 
for the ARMA parameters; doing so for models of the sort used here is a topic for further 
research, made particularly difficult here by the unrealistic assumption that the sampling error 
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Figure 2.a Eating Places: Composite (solid) and Signal Extraction (dotted) Estimates 
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Figure 2.b Drinking Places: Composite (solid) and Signal Extraction (dotted) Estimates 
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Figure 2.c_ Drinking Places: Alternative Signal Extraction Estimates 
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model is known. Examination of standardized residuals produced by the Kalman filter, and of 
their autocorrelations, suggested no major inadequacies with the fitted models for either series. 

The estimated models, (5.7a,b) with (5.9a,b), were used to produce signal extraction 
estimates of log(6,), which were then exponentiated to produce estimates of 6,. The results 
are shown in Figures 2a,b for the series with the estimated seasonal and trading-day effects 
removed. Notice that signal extraction makes only slight differences in the estimates for Eating 
Places,which contained little sampling error (low relative variance), but it makes a considerable 
difference in the estimates for Drinking Places, which contained much more sampling error 
(higher relative variance). Signal extraction variances for log(@,) were also produced; these are 
relative variances for the estimates of 6,. Table 2 shows that, depending on the location in the 
series, signal extraction produces about an 8%-32% improvement in CV over the final com- 
posite estimates for Eating Places (though the composite estimate CV is small), and about a 
27%-38% improvement in CV for Drinking Places. As noted previously, these results are 
optimistic, since they assume the true component models are those that were estimated. To 
partly address concerns about this, we next examine the sensitivity of the results for Drinking 
Places to variation in the model parameters. 


5.3 Sensitivity Analysis for Drinking Places 


Here we focus on sensitivity of results to variation in the sampling error model, since this 
was determined with less information than the signal model. Our approach is to vary parameters 
of the sampling error model, then reestimate the signal model and redo the signal extraction. 
While it would be preferrable to have more formal statistical measures of the signal extrac- 
tion error due to model error (which the present state of theory and computer software does 
not allow), this approach should at least help indicate in what respects the signal extraction 
results are sensitive to parameter variation and in what respects they are not. 

Comparing models (5.8b) and (5.9b) gives some indication of the sensitivity of the signal 
model to changes in o2, the innovation variance of the sampling error model, since (5.8b) 
corresponds to o2 = 0 and (5.9b) to 02 = 9.3 x 107°. The most noticeable differences are 
in the estimate of of, which is to be expected, and in the estimate of the seasonal moving 
average parameter, 7)> say, which was found to be essentially 1 in obtaining (5.9b). Reestimation 
of the signal model for other values of 02 yielded 4, = .99 as long as 02 = 3.0 x 107°. In 
light of this, and to simplify presentation of results, we assume 7, = 1 and use a signal model 
with seasonal indicator variables as in (5.9b). 

Figure 2.c. shows (seasonally and trading-day adjusted) signal extraction estimates 6, corre- 
sponding to sampling error models with (¢*,®) = (.564,.614) and (.764,.814), and with p = 
.986 and Var(log(u,)) = .00776 (the relative variance of the Horvitz-Thompson estimates) 
held fixed. These cover the extremes of 6, for the sensitivity analysis. The nature of the dif- 
ferent estimates 6, we have generated seems to roughly correspond to the value of CVs. = 
[Var(log(65,) — log(@s5)] “%, the signal extraction coefficient of variation achieved at the 
middle of the series. (CVs is very close to the lowest value, which is achieved at ¢ = 53 - see 
Table 2.) The lower CVs is, the smoother 6, is. CVs¢ is 2.78%, 3.28%, and 3.70% for ($°,®) 
equal to (.564,.614), (.664,.714), and (.764,.814) respectively. Other estimates 6, we generated 
lie closest to the signal extraction estimate in Figure 2.b. or 2.c. with the closest CV5.. 

We now consider the sensitivity of CVs. to variations in the sampling error model 
parameters, beginning with p. The only parameter in (5.7b) affected by a change in p is n. Table 
3 reports the values of n and corresponding values of p considered, and the resulting CV5.’s. 
We see CVs is somewhat sensitive to changes in p, especially increases: CV5¢ for p = 1 (3.49) 
is 6% larger than for p = .985 (3.28), the value used for (5.7b). 
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Table 3 
Sensitivity of CVs,! for Drinking Places to Changes in (Changes in p) 
n .00 — .05 —.10 —.15 — .20 — .25 
p 9375 .9642 .9792 .9888 9953 1.000 
CV 56 3.03 S212 be | oro 1 3.40 3.49 


. CV 56 is the signal extraction coefficient of variation for tf = 56 (the middle of the series), expressed as a percentage, 
i.e. the square root of Var(log(@;) — log(@;)) multiplied by 100. 


Table 4 
Sensitivity of CVs¢ for Drinking Places to Changes in Var(log(u,))! (Changes in 02) 


Var(log (u;)) .00676 .00726 .00776 .00826 .00876 
CV(HT)? 8.22 8.52 8.81 9.09 9.36 
o2 x 10° 8.16 8.76 9.30 9.97 10.57 
CVs6 3.16 3.23 3.28 3.35 3.40 


1 Var(log(u;)) is the relative variance of the Horvitz-Thompson estimators. 
2 CV(HT) is the coefficient of variation of the Horvitz-Thompson estimators, expressed as a percentage, i.e. the square 
root of Var(log(u;)) multiplied by 100. 


Table 5 
Sensitivity of Results for Drinking Places to Changes in (¢? ,) 


(i) Values of o2 x 10° for given (¢? ,®) 
3 


p 
564 .614 .664 .714 .764 
.614 16.90 14.70 12.36 9.98 7.64 
.664 15.03 13.00 10.87 8.72 6.62 
@ .714 13.04 i123 9.30 7.44 5.60 
.764 10.96 9.40 7.78 6.15 4.58 


814 8.79 Fel 6.17 4.85 eels 
(ii) Values of CVs¢ for given (¢°,®) 


¢? 
564 .614 .664 .714 .764 
.614 2.78 2.88 Peds pe 3.12 3.2) 
.664 2.95 3.04 3.14 3.26 3.38 
6 .714 3.10 ai 19 3.28 3.39 3.50 
.764 3.24 3.33 3.42 3.51 3.60 
814 3.36 3.45 3.54 3.62 3.70 


We next consider the sensitivity of CVs, to changes in Var(log(u,)). The only sampling 
error model parameter this affects is 07. Table 4 reports the values of Var(log ( u;)), its square 
root CV(HT), the corresponding o?, and the resulting CVs¢. We see less sensitivity of CV 56 
here than in Table 3. 
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Finally, we examine the sensitivity of CVs, to ¢° and ®. Holding Var(log(u;)) fixed at 
.00776 and changing (¢,®) also changes a2. Table 5 reports the grid of values used for (3,6), 
and resulting values of 02 and CVs¢. Notice o2 varies more here than in Table 4. We see CV 56 
increases substantially as ¢° and @ are increased. 

We conclude from this analysis that moderate changes in the sampling error model 
parameters have relatively small impacts on 6,. The largest changes we observed in 6, were 
around 2 percent. The same moderate changes in the sampling error model parameters have 
relatively larger impacts on the signal extraction variances, with CV;.’s changing by as much 
as 17 percent. This suggests that for this example the greatest concern in not knowing the 
sampling error model parameters may be in the effect on signal extraction variances, and the 
resulting measures of improvement over the composite estimates. However, in all the cases 
considered in the sensitivity analysis the signal extraction estimates showed a significant 
improvement in variance. 


5.4 Conclusions 


The Drinking Places example illustrates the potential gains that may be achieved with the 
time series approach to survey estimation. Both examples also illustrate the complex and delicate 
nature of the time series modeling that may be required. We view the results as preliminary 
for several reasons. First,the optimistic nature of the signal extraction variances that do not 
reflect parameter estimation error has been mentioned. Second, we have no clear explanation 
of why the signal extraction estimates lie above or below the composite estimates for long 
stretches of time. (This is obvious in Figure 2.b., and actually the case in Figure 2.a. as well.) 
For the Drinking Places example this behavior was evident throughout the sensitivity analysis, 
and so does not appear to be due to uncertainty in the parameters of the sampling error model. 
Weare in the process of exploring whether this may be due to the forms of the sampling error 
model or signal model being incorrect. In fact, Bell and Wilcox (1990) report that the correla- 
tions of e; and e/_’,; at lags not multiples of three are not necessarily zero, as was assumed by 
the model. 
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Robust Small Area Estimation Combining Time Series 
and Cross-Sectional Data 


D. PFEFFERMANN and L. BURCK! 


ABSTRACT 


The common approach to small area estimation is to exploit the cross-sectional relationships of the data 
in an attempt to borrow information from one small area to assist in the estimation in others. However, 
in the case of repeated surveys, further gains in efficiency can be secured by modelling the time series 
properties of the data as well. We illustrate the idea by considering regression models with time varying, 
cross-sectionally correlated coefficients. The use of past relationships to estimate current means raises 
the question of how to protect against model breakdowns. We propose a modification which guarantees 
that the model dependent predictors of aggregates of the small area means coincide with the corresponding 
survey estimators and we explore the statistical properties of the modification. The proposed procedure 
is applied to data on home sale prices used for the computation of housing price indexes. 


KEY WORDS: Kalman filter; Linear constraints; State-space models. 


1. INTRODUCTION 


Statistical Bureaus are often confronted with the demand to provide reliable estimators 
for small area means. The problem with the production of such estimators is that the sample 
sizes within those areas are usually too small to allow the use of direct survey estimators. As 
a result, new estimators have been proposed in recent years which combine auxiliary informa- 
tion (obtained from a census or administrative records) with the survey data obtained from 
all the small areas. The common feature of these estimators is that they can be structured in 
general as a linear combination of two components: a ‘‘synthetic estimator’’ of the form X/8 
where X; represents the average auxiliary information at the small area level and B is a vector 
of estimated regression coefficients; and a ‘‘correction factor’’ of the form (9; — XB) where 
J; and x; are the sample means of the target and the auxiliary variables. The correction factors 
are used to account for the variability of the small area means not explained by the auxiliary 
variables. The major difference between the various estimators is in the approach followed 
to determine the weights assigned to the two components in the linear combination, ranging 
from a ‘‘design based approach’”’ (Sarndal and Hidiroglou 1989) to ‘‘empirical Bayes’’ (Fay 
and Herriot 1979) and ‘‘mixed linear models’’ (Battese, Harter and Fuller 1989, Pfeffermann 
and Barnard 1991). 

Very few studies are reported in the literature on the possible use of the time series relation- 
ships of the data to further increase the efficiency of the small area estimators. This is despite 
the fact that many of the small area estimators are derived from repeated surveys such as labour 
force surveys. The econometric literature contains a vast number of studies on the combined 
modelling of time series and cross-sectional data, see e.g. Rosenberg (1973b), Johnson (1977, 
1980), Maddala (1977, Chapter 7), Dielman (1983) and Pfeffermann and Smith (1985) for 
reviews. However, none of these studies is directed to the problem of estimating (predicting) 
small area means from survey data. Fitting time series models to survey data has been considered 
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in the context of estimating aggregate population means, see the review papers of Smith (1979) 
and Binder and Hidiroglou (1988) and the more recent articles by Binder and Dick (1989), Tiller 
(1989) and Pfeffermann (1991). But again, these methods are not in routine use mainly because 
the classical survey estimators of the aggregate means are often almost as efficient when the 
models hold and more robust when the models fail to hold. 

The situation is clearly different when dealing with a small area estimation problem; it seems 
to us that for this kind of problem, the use of time series models can be of great advantage. 
Although the exact nature of the model to be used in a particular application is obviously ‘data 
dependent’, the class of models we consider in the next section is broad enough to apply to 
many, if not most of the small area estimation problems arising in practice. These models have 
the further advantage that their estimation is relatively simple. Estimation issues are discussed 
in Section 3. 

The use of a model always raises the question of how to protect against possible model 
failures and this question becomes even more sensitive when considering the use of a model 
for the production of official statistics. In Section 4 we consider this issue and propose a 
modification to the model dependent predictors which guarantees that for aggregates of the 
small area means for which the direct survey estimators can be trusted, the modified model 
predictors coincide with the survey estimators. The statistical properties of the modified 
predictors are explored. We conclude the article in Section 5 with empirical results which 
illustrate the performance of the model with and without the proposed modification. The data 
used for the illustrations are the sale prices of homes in the city of Jerusalem during the months 
of September 1985 through November 1989. These data are used routinely by the Central 
Bureau of Statistics in Israel for the computation of housing price indexes. 


2. REGRESSION WITH CROSS-SECTIONALLY AND 
TIME VARYING COEFFICIENTS 


2.1 A General Class of Models 


In what follows we denote by Y,, the n,, Xx 1 vector of observations on a target variable 
Y, pertaining to anareakattimet, k = 1, ..., K, t = 1, 2, ....Weassume for convenience 
that nx, = 1 but as becomes evident later on, the model permits that some of the areas not 
be observed at certain times. Let X;, define the corresponding n,, X (p + 1) design matrix 
of the auxiliary variables with a vector of ones as its first column. In many applications, 
the same row vector x; of auxiliary values applies to all the Y values of a given time so that 
Xi = MnyXte Where 1, , is a column vector of ones of length n,,. This is the case when the only 
available data are the small area survey estimators. Confidentiality as well as processing costs 
often preclude the use of micro data on individual survey respondents. The theory described 
in this article is not restricted to the availability of the micro data (see the example in Section 2.2) 
but data availability has an obvious effect on model specifications and precision of estimation. 


The regression model holding in area k at time ¢ is defined as 
Yin = Xin Bue + xi Elen) = 0, Elec) = ny (2.1) 


where Bx = (Biko, Brkis -- +> Bikp)- 


We define the (superpopulation) mean of the target variable values in area k at time f to be 


Ox = E(Mx | Bie) = Xe Bre (2.2) 
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where 


1 Ntk 1 Ntk 
M, = — Yu; and Xx = — VY xh 
tk Nx A tki ATK. Nu ye A tki 


i=1 i=] 


with = 1, ..., Nx indexing the population units. Obviously, when Xig = Xu then X, = xy. 
Let 6, define an estimator for Bx. Then 6, = XB, and 


1 Nk Ntk 1 "tk 
My = sa 3 Yur + DB ruba| = Ox + TA ye (Yi viubn) ) 
KL ja i=nytl th \ j=1 


implying that in the usual case of small sampling rates within the areas, Ox can also be con- 
sidered as an estimator of the finite population mean M,. For this reason we no longer 
distinguish between the finite and superpopulation means. 

The notable feature of (2.1) is that the coefficients B,, are allowed to vary both cross- 
sectionally and over time. The following equations specify the variation of the coefficients over 


time: 
‘el 2 Pee] #1 i Jeno He 2.3 
lie J Bu 0 Ntkj> J apes PB ae) 


where we use the notation Buy, J = 0, 1, ..., p, to define fixed coefficients which we interpret 
below, and 7; to define fixed (2 x 2) matrices and where the residuals {nj} satisfy 


E(nuj) = 0, E( nag nue) = 5, E(naj n-ne) = 9 for d> 0. (2.4) 


The implication of (2.4) is that residuals of different coefficients pertaining to the same time 
t are allowed to be correlated but the serial and cross serial correlations are assumed to be zero. 


Next, we illustrate the use of (2.3) by considering some simple cases: 


(a) 7; = [01] implies that By; = By; + nj so that 6,; represents, in this case, a common 
mean. This is the well known Random Coefficient Regression Model (Swamy 1971) which 
is often used in econometric applications. Obviously, by postulating, var(nyxj) = 0, the 
model reduces to the case of a fixed regression coefficient over time. 


(b) 7; = [40] implies that By; = B;—1,4; + mj which is the familiar random walk model, see 
e.g. Cooley and Prescott (1976) and LaMotte and McWhorter (1977) for application of 
this model in econometric studies. In this case the coefficient 8;,; is redundant and should 
be omitted so that 7; = 1. 


(c) G= ke a implies the first order autoregressive relationship (84; — By) = 0 (B;—-1,4; — By) 
+ 14; considered by Rosenberg (1973a). 


(d) 7; = i i implies that Bx; = By—1,4; + Bx + naj Which defines a local approximation to 
a linear trend (Kitagawa and Gersch 1984). The coefficient 8,; represents, in this case, a 
fixed slope. 
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It should be emphasized that different matrices 7; can be used for different coefficients 6,;. 
In fact, by defining af = (Biro Bo» Beers Bars - +++ Brkp» Bp); T = diag[T, T;, ..., Tp], 
a block diagonal matrix with 7; as the j-th block; G = I,,; ® || where J, 41 is the identity 
matrix of order p + 1 and & defines the Kronecker product and nie = (miko. tks «++» Mtkp)> 
the combined model holding for the coefficients 6, can be written as 


oe = Toy + Gos Elnx) = 0, E(gant_ax) = AgA (5) 


where Ag = 1 ford = Oand Aq = 0 otherwise, and A = [6,] is defined by the variances 
and covariances 6, (equation 2.4). 

The model defined by (2.5) specifies the variation of the regression coefficients of a specific 
area over time. The common approach to account for cross-sectional relationships between 
small area means is to allow for random small area effects which are time invariant {u,}. The 
general model defined by (2.1) and (2.3) includes this case by writing Yi~ = Inj Mi + Xin Bre 
+ x = XieBik + €x, Say, and specifying uy, = uy—1,4 + mx With ux = 0, var(mi,) = oF 
and var(n,) = 0 for t > 1 (compare with case (b) above). By assuming in addition the 
autoregressive relationship defined by case (c) for the intercept variable and fixing the other 
regression coefficients (case (a) with zero residual variances), the resulting model is similar to 
the model considered by Choudhry and Rao (1989) except that in their general formulation 
of the model the observation residuals of equation (2.1) are allowed to be serially correlated. 
Notice that equation (2.1) now contains two random ‘‘intercept terms’’ but the model is 
nonetheless identifiable. Choudhry and Rao assume that the only available data are the survey 
estimators so that the estimation of the serial correlations needs to be carried out externally, 
using the micro observations. Alternatively, a model accounting for the serial correlations can 
be postulated. Choudhry and Rao assume an AR(1) model in their study. 

A more general way to account for the cross-sectional relationships between the small area 
means is to allow for non zero correlations between the residual terms ny; and m_; of the 
models specifying the time series variation of the regression coefficients B,,; and 6,,,; operating 
in areas k and m (equation 2.4). Often it is reasonable to assume that the correlations 
decay as the distance between the areas increases. This can be formulated as, E( nj, Nimj) = 
6; 0j5;(k,m), k # m, where f;(k,m) is a monotonic decreasing function of the distances 
D(k,m). The case of geometrically decaying correlations is obtained by defining f;(k,m) = 
p yf —™|—1_ The case of fixed correlations is obtained by specifying f;(k,m) = 1 and in what 
follows we consider this case only. Allowing for fixed cross-sectional correlations for all the 
regression coefficients can be formulated as 


E(au0im) = D(A)®, k #m (2.6) 


where D(A) is the diagonal matrix with the variances 6;; on the main diagonal and 9 is another 
diagonal matrix composed of the correlations p;. 

Before concluding this section we present the model defined by (2.1), (2.5) and (2.6) ina 
state-space form. Presenting the model in this form has important computational advantages. 

Let Y/ = (Y;, ..-, Yx) represent the vector of observations of length n, = Y;,n, for all 
the areas at time ¢ and let ef = (€/;, ...,€/«) represent the corresponding regression residuals. 
Define Zi, = [Vntks Ontks Xtkr> Ontks + + «> Xtkp» Unk] where 0,1 is a vector of zeroes of length ny and 
X;xj iS the vector of values for the j-th auxiliary variable, j = 1, ..., p. Let Z, be the block diag- 
onal matrix composed of the matrices Z,,. The matrix Z, is of ordern, X [K x 2 X (p+ 1)]. 
Define also Qs Oe ane Qk)» nt = (an, ---> 0k), Le = Diag [ot dna, wenn sk ligax I 
T = Ix ® T, and G = Ixy ®@G. 
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Using this notation, the model defined by (2.1), (2.5) and (2.6) can be written compactly 
as 


Y= Za; +6; El) = 9, Eee) = Yr (2.7) 
@ = To; + Gyr; E(m) = 9, E(nin/) = A, (2.8) 


where A = [Ay], k,?= 1, ..., K with Ag = A when k = @ and Axe = D(A)® when 
k # ¢. The matrices Ay are (p + 1) X (p + 1). 


The model defined by (2.7) and (2.8) conforms to the classical state-space formulation, 
see, e.g. Anderson and Moore (1979) and Harvey (1984). By this formulation, (2.7) is the 
observation equation and (2.8) is the state equation with a, defining the state vector. The 
apparent advantage of restructuring the model in a state space form is that the vectors Ors 
and hence the population means 0,,, as well as the estimation error variances can be esti- 
mated conveniently by means of the Kalman filter. We discuss the use of the filter in 
sections 3 and 4. 


2.2 Explicit Estimators of the Small Area Means 


In order to illustrate how past and neighbouring data are used under the model to 
“‘strengthen’’ the small area estimators we consider the case where the same vector Xi of 
auxiliary values applies to all the units of a given area at a given time. In this case the obser- 
vation equation can be formulated in terms of the sample means, i.e. 


Vi = XnBu + xs El€x) = 0, Sta ez E/N | te Bip (2.9) 


Suppose that the regression coefficients follow a random walk (case (b) of equation 2.3) 
so that for area k 


Brg = Brij + Mugs Elna) = 9, E(nag nae) = 5./,0= 1, ...,p (2.10) 
and for areas k # m, 
E (ntkj Ntmj) = OF Oya E(ntkj Ntme) = 0,j # ?. (2.11) 


The random walk model implies that the coefficients drift slowly away from their initial 
value with no inherent tendency to return to a mean value. Obviously, for residuals Nrkj Such 
that E (nz) = 0 the corresponding regression coefficients are fixed over time. Notice also 
that since By, = B;_1,~ + n, the predictor of 8, at time (¢ — 1) is the same as the predictor 
Br—1,~ Of Br-1,x- 

Using the Kalman filter equations presented in section 3, it is shown in the Appendix that 
the estimator Ox of the small area mean 0,, (equation 2.2) can be structured in this case in 
the following form 


: . ike cali of x 
On = XKBr-1,4 + (1 - eke (Tix ay XtkB1—-1,k) tr ehDe Yim( (Yim — Xim By 1 i) 
NtKVK NKVi m=1 (2512) 


m#k 
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where the coefficients {y,,.} are the partial regression coefficients in the regression of 
x = (Ya — Xi Br—-1,~) against the prediction errors (@m = (Yim — Xtm6r—1,m)} obtained 
in the other areas and v is the residual (unexplained) variance in the regression. 

The estimator 6,, is composed of three components: the ‘‘synthetic”’ estimator, x/, B,_ fh 
where 6,_ 1,~ 18 the optimal predictor of 6, based on all the observations up to and including 
time ¢ — 1, the ‘‘correction factor” (Yy — xj B;— 1,x) based on the prediction error in area 
k, and an ‘‘adjustment factor’’ based on the prediction errors observed for the other areas. 
The first two components correspond to the components of the classical small area estimators 
discussed in the introduction. Notice that the smaller the sample size n,,, the smaller is the 
weight assigned to the current sample mean Y,, in the estimation of 9, and the larger is the 
weight assigned to the time series predictor x}, B,_ 1,- Lhe third component in the right hand 
side of (2.12) represents the information borrowed from neighbouring areas. The weight 
assigned to this component depends on the magnitude of the correlations p; between the cor- 
responding error terms {7,,;} in the models holding for the regression coefficients (equation 
2.11). Obviously, when the regressions in the various areas are independent so that pj = 0 for 
all j and hence y;,, = 0 for all m, the third component vanishes and the predictor 8, reduces 
to a weighted average of the current mean Y,, and the time series predictor x/, B,_ 1. ke 


3. MODEL ESTIMATION AND INITIALIZATION 
USING THE KALMAN FILTER 


3.1 Estimation of the Regression Coefficients by Means of the Kalman Filter 


In this section we present the Kalman filter equations for the updating and smoothing 
of the state vectors qa, defined by the equations (2.7) and (2.8) (the area regression coeffi- 
cients in our case). We assume that the V-C matrices )', and A are known. Estimation of 
these matrices is considered in section 3.2. The theory of the Kalman filter is developed in 
numerous publications (see e.g. Anderson and Moore 1979 and Meinhold and Singpurwalla 
1983) and so we restrict the discussion to aspects most germane to the small area estimation 
problem. 

Let @,_, be the best linear unbiased predictor (blup) of a,;_, based on all the data observed 
up to time (¢ — 1). Since @,_, is blup for @;_1, G1 = Td;_, is the blup of q, at time 
(¢ — 1). Furthermore, if P,_,; = E(@;_,; — a+~-1) (@+-1 — @;-1)’ is the V-C matrix of the 
prediction errors at time (tf — 1), Py; = TP;-\T’ + GAG’ is the V-C matrix of the 
prediction errors (@;,_; — @;). (Follows straightforwardly from 2.8). 

When a new vector of observations [ Y;,Z,] becomes available, the predictor of a, and the 
V-C matrix P,_, are updated according to the formulae 


Ae isin pl 4 
Oe Fl Aye ot Prit—1Zy Fg (Xour Yrj1-1) 


Gat 
P= U- Pr rni Zt Fp Ze) Pye 


where Y;),_; = Z; & ,-1 is the blup of Y, at time (t — 1) so that e, = (Y; — Yj;_1) is the 
vector of innovations with V-C matrix F, = (Z,Py;-1Z; + Yr). 

The new data observed at time ¢ can be used also for the updating (smoothing) of past 
estimators of the state vectors and hence for the updating of past estimators of the small area 
means. Denoting by ¢* the most recent month with observations, the smoothing is carried out 
using the equations 
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Bye = & + PT’ Pr (Grsije — Tey) 
(3.2) 
Pye = P, + P,T' Priel Pra iye i Pr iy)Praiye TP; BZ pens 


where P,i;+ is the V-C matrix of the prediction errors (Qt — a). Notice that &) = &» and 
P| = P, define the starting values for the smoothing equations. 

Estimators of the small area means or aggregates of the means are obtained from the filtered 
(or smoothed) estimators of q, in a straightforward manner using the relationship 0, = 
Xu brur ay Lik Qik = Lin A rate where Lik a (i, 0, Xx, Oss Meas 0) and Ar is the appro- 
priate indicator matrix. Hence, if 0;” = Lf_,w,0,, then 0” = VEL wp Zp Andy = Btw Os 
say. For given V-C matrices )}, and A, the MSE’s of the estimation errors are obtained as 


E(x — On)? = ZhAnPAkZn and E(O% — 0%) = g/,Pany,. (3.3) 


Notice that the MSE’s in (3.3) are with respect to the joint distribution of the observations 
{Y} and the vectors of coefficients {@,,} so that they represent average MSE’s over the 
possible realizations of the area means. 


3.2 Estimation of the V-C Matrices and Initialization of the Filter 


The actual application of the Kalman filter requires the estimation of the unknown elements 
of the matrices )}, and A and the initialization of the filter, that is, the estimation of the vector 
@, and the corresponding V-C matrix P, of the estimation errors. In this section we describe 
simple estimation procedures which can be used for these purposes. 

Assuming a normal distribution for the residual terms ¢, and 7, of equations (2.7) and (2.8), 
the log likelihood function of the vectors Y,,41, ..., Y,, conditional on the first m vectors 
Yi, .--,5 Ym, can be formulated as 


ote 
L() = constant — 5 Yo dog | Felt ef Fle) (3.4) 


t=m+1 


where \ contains the unknown model variances and covariances written in a vector form. The 
scalar m defines the number of time periods needed to construct initial values for the Kalman 
filter. (For the random walk model considered in section 2.2, m = 1, provided that sufficient 
data are available in every area to allow the computation of the OLS estimators of the vectors of 
coefficients). The expression in (3.4) follows from the prediction error decomposition, see 
Schweppe (1965) and Harvey (1981) for details. For given matrices Y, and A, the innovations 
e, and the V-C matrices F, can be obtained by application of the Kalman filter equations (3. 1}: 

The computation of the likelihood function requires the initialization of the Kalman filter 
which can be carried out most conveniently by application of the approach proposed by Harvey 
and Phillips (1979). By this approach, the nonstationary components of the state vector are 
initialized with very large error variances which corresponds to postulating a noninformative 
prior distribution so that the corresponding state estimates can conveniently be taken as zeroes. 
(For the random walk model, initializing with a noninformative prior yields the OLS estimators 
after one time period, see Meinhold and Singpurwalla 1983, for a Bayesian formulation of 
the Kalman filter). The stationary components of the state vector are initialized by the cor- 
responding unconditional means and variances which may be part of the unknown parameters 
defining the arguments of the likelihood function. 
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Maximization of the likelihood function (3.4) can be implemented using the method of 
scoring with a variable step length. In particular, let \,,) define initial estimates of the un- 
known elements in \. Then the method of scoring consists of solving iteratively the set of 
equations 


Away = AG-1) + ral} 7g lu-1] (3.5) 


where \,(;-1) is the estimator of \ as obtained in the (i — 1)-th iteration, J[\(;-1)] is the 
information matrix evaluated at \;_,; and g[A,;_-1)] is the gradient of the log likelihood 
evaluated at \;_,. The coefficient 7; is a variable step length introduced to guarantee that 
L[\ iy] = L{AG-1)] in every iteration. The value of 7; can be determined by a grid search 
procedure in the region [0,1]. The formulae for the k-th element of the gradient vector and 
the kf-th element of the information matrix are given in Watson and Engle (1983). 

Having estimated the model variances and covariances, these estimates can be substituted 
for the true parameters in the Kalman filter equations (3.1) - (3.2) to yield the estimators of 
the regression coefficients and the V-C matrices and hence the small area estimators and their 
variances (see equation 3.3). Notice however that the estimated V-C matrices ignore the 
variability induced by the need to estimate the unknown elements contained in \. Ansley and 
Kohn (1986) propose correction factors of order 1/t* to account for this extra variation in state 
space modelling using first order Taylor approximations. Hamilton (1986) proposes a Monte 
Carlo procedure which consists of sampling from a multivariate normal distribution with mean 
given by the maximum likelihood estimator of the vector \ and V-C matrix defined by the 
inverse of the information matrix, and estimating the state vectors for each random realization 
of the parameter values. This procedure is more flexible in terms of the assumptions involved 
and provides further insight into the sensitivity of the Kalman filter estimators to errors in the 
variance and covariance estimators. However, it is computationally more intensive. 


4. MODIFICATIONS TO PROTECT AGAINST 
MODEL BREAKDOWNS 


4.1 Description of the Problem and Proposed Modifications 


The use of a model for small area estimation seems inevitable in view of the small sample 
sizes within the areas. However it raises the question of how to protect against model break- 
downs. Testing the model every time that new data becomes available is often not practical, 
requiring instead the development of a ‘‘built-in mechanism’’ to ensure the robustness of the 
estimators when the model fails to hold. 

One possibility is to modify the regression estimators derived in the various time periods 
so that they satisfy certain linear constraints obtained by equating aggregate means of the raw 
data with their expected fitted values under the model. More precisely, we propose to augment 
the model equation (2.1) by linear constraints of the form 


Wie Ye Yar = We YE xi Ba 6 = 1,2... (0), t= 1,...,0 4.1) 
k i k I 


where the coefficients W,” are fixed, standardized weights such that ¥,n,W\? = 1. An 
example for such a constraint would be the equation 


Survey Methodology, December 1990 225) 


K g K K K 
NS NuMiz Dy Nx = yy Nu (Boo | yb. Nix (4.2) 
k=1 k=1 


k=1 k=1 


where M,, is the direct, survey estimator in area k. For ¥4 = Xx, the equation (4.2) 
guarantees that the model dependent predictor of the aggregate population mean coincides 
with the corresponding survey estimator. Such a constraint can be justified by arguing that 
the survey estimators, although not reliable enough for estimating the small area means due 
to the small sample sizes, can be trusted when being combined for estimating the aggregate 
mean. Notice that ‘‘adding up”’ constraints are ordinarily imposed on statistical agencies 
anyway. Battese, Harter and Fuller (1988) and Pfeffermann and Barnard (1991) use a similar 
constraint for analysing cross-sectional surveys. Often, the small areas can be grouped into 
broader groups, with sufficient data in each of the groups to justify the use of the survey 
estimators for estimating the corresponding group means. In this case, one can impose several 
constraints of the form (4.2) where the summation is now over the areas belonging to the same 
group. Notice in this respect that in view of the correlations between the regression coefficients 
operating in the various areas, a constraint applied to a sub-set of the areas will modify the 
regression estimates in all the areas. We illustrate this property in the empirical study. 

It is important to emphasize that the set of constraints in (4.1) does not represent external 
information about possible values of the regression coefficients. Rather, it serves as a ‘‘control 
system’’ to guarantee that the model estimators adjust themselves more rapidly to possible 
changes in the behavior of the regression coefficients. As a result, the variances of the modified 
regression estimators are slightly larger than the variances of the optimal estimators under the 
model. Obviously, when no such changes occur and the variances of the aggregate means are 
sufficiently small, one would expect the constraints to be satisfied approximately even without 
imposing them explicitly. As mentioned above, it is possible to incorporate several separate 
constraints in each time period but it is imperative that the variances of the corresponding 
aggregate means will be small enough to ensure that the modifications are indeed needed and 
do not interfere with the random fluctuation of the raw data. 


4.2 Inference Incorporating the Linear Constraints 


In Section 4.1 we proposed to amend the model equations (2.1) by imposing the set of 
constraints (4.1) thereby ensuring the robustness of the regression estimators against sudden 
drifts in the values of the coefficients. 

Computationally, this can be implemented most conveniently by augmenting the vectors 
Y, of equation (2.7) by the scalars Y,W Y; ¥,;, augmenting the matrices Z, by the corre- 
sponding row vectors (Wi) IjnZn, ..., WSP 1/.xZ,x) and setting the respective variances of 
the residual terms to zero. The augmented set of equations, together with (2.8), form a pseudo 
state-space model which could be estimated using the Kalman filter equations (3.1). Notice 
that the pseudo V-C matrix ¥{”) of the augmented residual vector is no longer positive 
definite (the last L(t) rows and columns of Y{?? consist of zeroes) but this does not cause 
computational difficulties. 

The drawback of applying the Kalman filter to the pseudo model is that the V-C matrices 
of the regression estimators fail to account for the actual variability of the aggregate means 
appearing in the left hand side of (4.1). In order to deal with this problem, we propose to amend 
the formula for the updating of the V-C matrix P, (equation 3.1) so that the variances and 
covariances of the aggregate means will be taken into account. 


226 Pfeffermann and Burck: Time Series and Cross-Sectional Estimation 


Let Y{” and Z{“) represent the augmented Y vector and Z matrix at time ¢ and denote by 
¥ *) the actual V-C matrix of the residual terms [Y{4) — Z{4)q,]. The matrix ¥{“) is of 
order [n, + L(t)] with Yin the first m, rows and columns and the variances and covariances 
of the means ©, W4? ¥; Y,,; among themselves and with the vector Y, in the remaining rows 
and columns. Denoting by &{4} the robust predictor of a,_, as obtained at time (¢ — 1) using 
the pseudo model and by P,\4) the actual V-C matrix of the errors (@{4} — @;_;), the 
modified state estimator at time ¢ is obtained as 


G{4) = THA} PYZAM yi). — Zire sAl (4.3) 
where P(®), = (TP{4)T’ + GAG’) and F{) = Z{4) Pf) ,Z/4 + Lf) (Compare with 
3.1). It is shown in the Appendix that the actual V-C matrix P{“) of the errors (@{4) — @;) 
satisfies the recursive equation 


PY) = [1 — KP ZAP, + KPLEM — LPR, (4.4) 


where K{?) = P{#) ,Z{4) (F{)) ~! is the pseudo Kalman gain. The first expression on the 
right hand side of (4.4) corresponds to the usual updating formula of the Kalman filter (compare 
with 3.1)). The second expression is a correction factor which accounts for the actual variances 
and covariances of the means ©, Wi LD; Y,,;, not taken into account in the first expression. 
The amended Kalman filter defined by the equations (4.3) and (4.4) produces robust predictors 
&{“) instead of the optimal, model dependent predictors, &, but otherwise uses the correct V-C 
matrices under the model. Thus, this filter can be used for the routine estimation of the vectors 
of coefficients and hence for the estimation of the small area means, and when the model holds 
it will give similar results to those obtained under the optimal filter. In periods where the model 
fails to hold, the updating formula (4.4) could be incorrect (depending on the particular model 
failures) but the predictors &/{4) will nonetheless satisfy the linear constraints (4.1). The 
smoothing equations (3.2) can likewise be modified to satisfy the linear constraints. 


5. EMPIRICAL RESULTS 


5.1 Description of the Data and Model Fitted 


In order to illustrate the important features of the class of models defined in Section 2, we 
fitted such a model to home sale prices in Jerusalem. The sale prices are recorded on a monthly 
basis and are routinely used by the Central Bureau of Statistics in Israel for the computation 
of monthly housing price indexes (HPI) adjusted for changes in quality. The HPI is computed 
separately for each city or group of cities and for each house size defined by the number of 
rooms, ranging from 1 to 5. The number of transactions carried out each month is very small 
in many of these cells and for 1 room apartments it occasionally happens that there are no 
transactions. The mean and standard deviation (S.D.) of the monthly number of transactions 
carried out during the period July 1987 - November 1989 are listed below. 
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The need to adjust for changes in quality results from the fact that the transactions performed 
are not under control, giving rise to large differences in quality from one month to the other 
particularly in the small cells. The following quality measure variables (QMV) are recorded 
for every transaction: X‘!) - the apartment floor area, X?) - the age of the apartment, X®), 
X™ _ dummy variables defining districts within the city. 

The problems involved in the computation of the HPI and the method used in Israel are 
discussed at length in a recent article by Pfeffermann, Burck and Ben-Tuvia (1989). The 
following model was proposed by the authors as an alternative to the model in current use. 
The triple index ‘‘tki’’ defines the i-th transaction of size k in month ¢ with Y,,; standing for 
the log of the sale price and XY) = log(XY), j = 1, 2. 


Yei = Beco + BurXhy + BurXh? + BusX? + BusXSP + €x:i (5.1) 


Bro = Br-1,00 + Bro + exo 
(5.2) 
Bag = Brig + ep J = 1, ..., 4, 


with the error terms €,,; and ny; Satisfying the assumptions (2.1), (2.4) and (2.5). Notice that 
the model assumed for the intercept term is the local approximation to a linear trend defined 
under case (d) of Section (2.1). The model assumed for the other coefficients is the random 
walk model defined under case (b). 

The regression defined by (5.1) forms the basis for the construction of an HPI 
adjusted for changes in quality. By fixing the values of the QMV’s at their average population 
values which are constant over time, (the values of these variables are adjusted approximately 
every five years), average sale prices can be computed using (5.1) and these averages are 
comparable between months since they refer to homes of similar qualities. 

Pfeffermann, Burck and Ben-Tuvia discuss the considerations in selecting the model defined 
by (5.2) for the regression coefficients. They show empirical results which validate the fitness 
of the model. However, the results of that study were obtained by fitting the model to each 
cell separately, that is, without accounting for the cross-sectional relationships of the regres- 
sion coefficients. This aspect of the model is explored in the present study. Another major 
purpose of the empirical study is to illustrate the performance of the modifications proposed 
in Section 4 to protect against model breakdowns. 


5.2 Estimation of the Model 


The model defined by (5.1) and (5.2) can be put in a state-space form similar to (2.7) and (2.8). 
In fact, the vectors q, and the matrices Z,, T and G assume, in this case, simple structures, since 
forj = 1, ..., 4, By = 0 (see case (b) of Section 2.1). Thus, a = (Bo, Bros Bats «++» Bika)s 
Lik 7 [Inet nek Xk, se Sagas i i [e1,€1 + €0,€3,.- -» 6], a6 xX 6 matrix with ej 
having a one in position j and zeroes elsewhere and G = [e,@3, ...,@] whichis6 x 5. The 
matrix A is defined as in (2.5). The vector a, and the matrices Z,, 7, G and A are obtained 
from the vectors {@,,} and the matrices {Z,,}, T, G and A in the same way as in (2.7) and 
(2.8). 

Having set the model in a state-space form we next attempted to estimate the unknown 
variances and covariances using the method of scoring algorithm described in Section 3.2. 
As it turned out, however, the computer time needed for convergence was way beyond the 
capacity of the IBM 1481 mainframe used for this study. Notice that the number of unknown 
parameters of the combined state-space model is dim(\) = 25 whereas the dimension of the 
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state vectors and hence the dimension of the corresponding V-C matrices is dim(q@,) = 30. The 
total number of observations per month ranges from 55 to 353. The computer program written 
for this study uses numerical derivatives so that each iteration of the method of scoring requires 
a separate sweep through all the data with each sweep involving [dim(\) + 1] computations 
of the state vector a, and the V-C matrix P, (equation 3.1) at each point in time. These 
computations are needed in order to evaluate the log likelihood functions and hence the cor- 
responding derivatives. It is clear therefore that the computational costs increase with the length 
of the series, the number of observations, the size of the state vector and the number of unknown 
parameters. 

In order to deal with this problem we estimated the variance of (equation 2.1) and the 
matrix A (equation 2.5) separately for each of the five apartment sizes using the time series 
of observations corresponding to each size and then estimated the correlations p; (equation 
2.6) by a crude, grid search procedure. We found that setting p; = 1% for every j gives satisfac- 
tory results both in terms of the behaviour of the innovations (the one step ahead prediction 
errors) and in terms of the smoothness of the regression coefficients corresponding to apart- 
ments of size one and five where the monthly sample sizes are very small. Notice that by 
estimating the variances and covariances defining the time series relationships of the regression 
coefficients separately for each size, one is more flexible in terms of the model assumptions 
although there is some loss of efficiency if the variances and covariances are indeed the same 
across the different sizes. 


5.3. Results 


Pfeffermann, Burck and Ben-Tuvia (1989) illustrate the adequacy of the time series models 
fitted to the various apartment sizes. As mentioned earlier, our purpose in this study is to 
compare the results obtained with and without the accounting for the cross-sectional correla- 
tions and to illustrate the performance of the modifications (4.1) in protecting against model 
breakdowns. 

In order to sharpen the comparisons as much as possible, we deliberately inflated the 
Y-values by 5 percent in each of the following four months: October 1987, November 1988, 
January 1989 and May 1989. Thus all the Y-values of all the apartment sizes corresponding 
to the months October 1987 - October 1988 were inflated by 5 percent, the Y-values correspon- 
ding to November 1988 - December 1988 were inflated by 10.25 percent (5 percent on top of 
the previous 5 percent) and so forth. These kinds of model breakdowns (although obviously 
not in such magnitudes) may result from intentional devaluations of the currency and are of 
main concern when modeling sale prices. See Pfeffermann, Burck and Ben-Tuvia for further 
discussion. Similar model breakdowns may occur, for example, with series of unemployment 
rates in periods of abrupt economic recessions. 

Table 1 shows the average mean squared errors (AMSE) of the model residuals €,; = 
(Yi — Biko — Lf1 Xf) Byj) and the model innovations ex; = [Yi — (Br-1,40 + Bro) — 

aay xX ou) By 1,4; ] (See equations 5.1 and 5.2), separately for each of the five apartment sizes. 
The AMSE’s were computed as AMSE,(€) = 1/N YX, (1/m, LM, €7.;); AMSE,(e) = 
I/N YE, (/n, LP, ei) where t = 1, ..., Nindexes the months of July 1987 - November 
1989. We distinguish between four different estimators of the regression coefficients as defined 
by whether the model accounts for the cross-sectional correlations (op; = 2), (o; = 0) and 
by whether or not the estimators are modified to protect against the model breakdowns 
(abbreviated as ‘‘Rob. Inc.’”’ and ‘‘No Rob.”’ in the table). The modifications were carried 
out by augmenting the observation equation of each month by three linear constraints of the 
form 4.2. These constraints forced the aggregate means of the fitted values in each of the three 
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Table 1 


Average Mean Squared Errors of Residuals and Innovations With and Without 
the Accounting for Cross-sectional Correlations and the Inclusion of the 
Robustness Modifications, by Size 
ere 
Mean Squared Errors of Innovations Mean Squared Errors of Residuals 


p=y p =0 p= p=0 
Rob. Inc. NoRob. Rob.Inc. NoRob. Rob.Inc. NoRob. Rob. Inc. No Rob. 


1 141 134 .176 218 021 027 056 .092 
2 .070 .090 .084 123 021 .039 .023 .070 
Ss .065 .090 .070 sLOF OL .042 .019 143 
4 .067 123 1072 198 .019 .066 021 141 
5 .067 114 One nos .023 .033 .065 106 


eee errr eee 


districts to coincide with the corresponding means of the observed values. When incorporating 
the constraints, the model was fitted using the amended Kalman filter as defined by the 
equations (4.3) and (4.4). 

In order to illustrate the performance of the four sets of regression estimators in the various 
months and in particular, in and around the months where we inflated the data, we plotted 
the monthly MSE’s of the innovations and residuals as obtained for 3 and 5 room apartments. 
The plots are shown in Figures 1 to 4. Notice that the values of Table 1 for 3 and 5 room 
apartments are correspondingly the averages of the values shown in the four figures. 


The main conclusions from the table and the graphs are as follows: 

Accounting for the cross-sectional correlations and including the linear constraints to pro- 
tect against the model breakdowns yields better results than in the other cases considered. This 
outcome is most prominent in the cells of 1 and 5 room apartments where the sample sizes in 
each month are very small. In the other three cells, there are only small differences between 
the case (9 = %, Rob. Inc.) and the case (o = 0, Rob. Inc.) which could be expected since 
as the number of observations in each month increases, there is less borrowing of information 
from neighbouring cells (small areas in the more general context). The situation is different, 
however, when the linear constraints are removed. Accounting for the cross-sectional correla- 
tions yields in this case much better results than when not accounting for them and this is true 
for all the apartment sizes. Thus, by borrowing information from one cell to the other, the 
estimators of the regression coefficients adapt themselves much more rapidly to the sudden 
drifts in the data as seen also more directly in the figures [The four peaks in each graph are 
in the months where the data were inflated and as can be seen, the graphs corresponding to 
the case(o = %, No Rob.) return to their normal level of the months before the inflation much 
faster than the graphs representing the case (9 = 0, No Rob.) 

Another interesting comparison is between the case where the linear constraints are included 
and the case where they are not. Clearly, the inclusion of the constraints improves the results 
substantially when accounting for the serial correlations and the improvements are even more 
prominent when the serial correlations are set to zero. It is interesting to compare in this context 
the figures exhibiting the monthly MSE’s of the innovations with the figures exhibiting the 
monthly MSE’s of the residuals. In the four months where we inflated the data the MSE’s of 
the innovations are high which is obvious since the innovations are the differences between 
the observations and their predictors from previous months. Still, when the linear constraints 
are included, the MSE’s return to their normal level right after the months of inflation. As 
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o—o (pj = 3, Rob. Inc.) eee (pj = 0, Rob. Inc.) 
= 0, No Rob.) 
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Figure 1 Monthly Mean Squared Errors of Innovations, 3 Room Apartments 
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(pj) = 4,NoRob.)  ------ (Pp; = 0, No Rob.) 


Figure 2 Monthly Mean Squared Errors of Residuals, 3 Room Apartments 


Survey Methodology, December 1990 251 


Jie 


1.0 


0.8 


0.6 


0.4 


0.2 


3 —o (Pj = 3, Rob. Inc.) wees (pj = 0, Rob. Inc.) 


i) 
ll 


s,NoRob.) ------ (pj = 0, No Rob.) 


Figure 3 Monthly Mean Squared Errors of Innovations, 5 Room Apartments 
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Figure 4 Monthly Mean Squared Errors of Residuals, 5 Room Apartments 
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for the residuals, once the linear constraints are included, there is practically no increase in 
the MSE values in the months of inflation in the case of 3 room apartments and, when 
accounting for the serial correlations, only a slight increase in the case of 5 room apartments. 
However, when ignoring the serial correlations, the residual MSE’s for 5 room apartments 
are much larger in the months of inflation than in the other months even when imposing the 
constraints. This outcome has a simple explanation. The linear constraints are imposed on the 
aggregate means of the fitted values in each district but since the number of observations in 
5 room apartments is a small fraction of the total number of observations, the constraints alone 
have a relatively small effect on the estimated regression coefficients in this cell. On the other 
hand, the constraints have a large effect on the estimated coefficients in the other cells so that 
when accounting for the cross-sectional correlations, the estimators corresponding to 5 room 
apartments are also modified since they are correlated with the other coefficients. 

The way by which the linear constraints protect against sudden drifts in the data is illuminated 
in Figure 5 where we plotted the monthly intercept estimates for 3 room apartments. 

As can be seen, with the linear constraints included, the intercept adapts itself to the new 
level of the data in the same month that the inflation occurs. Without the inclusion of the 
constraints, the adaption to the new level of the data takes several months. The plot of the 
monthly intercept estimates of 5 room apartments does not have this nice pattern since with 
the small sample sizes observed each month, the effect of the inflation is to alter also the other 
regression coefficients. 


Jul. 87 Jan. 88 Jan. 89 Nov. 89 


o——o (pj = 3, Rob. Inc.) 


(pj = 3, No Rob.) 


Figure5 Monthly Estimates of Intercept, 3 Room Apartments 
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Figure6 Variances of Estimators of Cell Means (x 104), 3 Room Apartments 
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Figure 7 Variances of Estimators of Cell Means (x 104), 5 Room Apartments 
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Our discussion so far centered on the empirical distribution of the model residuals and 
innovations. A major application of small area estimation is the prediction of the small area 
means (equation 2.2). Clearly, when a model yields residuals with well behaved properties it 
can also be expected to yield good estimators for the population means. Nevertheless, it is 
interesting to compare the theoretical variances of the small area means estimators as obtained 
with and without the accounting for the cross-sectional correlations, under the model which 
accounts for these correlations with p; = 2. This comparison permits the assessment of the 
loss in efficiency when the serial correlations are ignored. 

Figures 6 and 7 show the monthly variances of the cell mean estimators as obtained for 3 
and 5 room apartments. (The variances have been multiplied by 10*.) The figure for 3 room 
apartments also contains the variances of the ordinary least squares (OLS) estimators of the 
population means, that is, the variances of the estimators when estimating the regression 
coefficients in each month by OLS. These estimators are not operational in the case of 5 room 
apartments because of the very small monthly sample sizes. 

The important conclusion drawn from the two figures is that by accounting for the cross- 
sectional correlations the variances of the resulting estimators can be reduced quite substan- 
tially, depending on the sample sizes. This is obviously the case in the case of 5 room apart- 
ments but is also true for 3 room apartments despite the fact that the sample sizes in these cells 
are relatively very large. The large sample sizes ordinarily obtained for 3 room apartments make 
the OLS estimators quite comparable to the estimators obtained when ignoring the cross- 
correlations in the estimation of the population means. Notice however the big gap between 
the variance of the OLS estimator and the variance of the other two estimators in October 1987. 
In this month there were only 10 observations of 3 room apartments and it is here where the 
use of the past data has its main impact even when ignoring the cross-sectional correlations. 
(The number of observations for 3 room apartments in November 1987 is 28; in all the other 
months there are at least 46 observations.) 

Another important outcome arising from the two figures is the much greater stability of 
the variances of the optimal estimators under the model as compared to the variances of the 
estimators which ignore the cross-sectional correlations. Notice in this respect that the 
differences in the variances from one month to the other depend not only on the sample sizes 
in each month but also on the values of the explanatory variables (the design matrix) and the 
amount of past data observed. Still, it is the sample sizes which mostly explains the differences 
in the variances of the estimators particularly towards the end of the series. 
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APPENDIX 


a) Derivation of Equation (2.12) 


When Xi = Xx» On = XB = ZieG 80 that O, = (Oy, ..., 6x) ‘= 2,0. 
Also, for the random walk model the matrix T is the identity matrix and by equation (3.1) 


LZ Oy = Z,;-1 + (Z:Prir-1Z/ ) Fr '(Y, — Z;4;-1) = 
(i= LeFr')Y, Tera 2,72 (Al) 


since F, = (Z;P,;-;Z; + L,). Suppose for convenience that k = 1 and define 


ff hy hi 
F, = fi fi and HH. = j Pe = SE were Siu and hy, 
Ji Fr» hy » Ay 


are scalars, fi and Ay are [1 xX (K — 1)] and Fy, and A>, are [(K — 1) X (K — 1)]. 


Using this notation, it follows from (A1) that 


i a? Ofek hi 

On = (1 - aL Yn a Fn (ei 1,1) <a ye hy —— ex (A2) 
Nyy Nt poo hy 

Let yi = (N12. ---» ik) = Si Fr! defines the partial regression coefficients in the regression 


of @ pon(e5n oA. Pe pyrand 'y? Sf, fi F5: Ji) define the residual variance in the 
regression. 


Equation (2.12) follows directly from (A2) since 
fiFn' = -—hi; (fi — fi Fn'fi) 7! =hy (A3) 
by well known properties of the inverse of a partitioned matrix. 


b) Derivation of Equation (4.4) 


By (4.3), 
QO = (I KOZ) TEA) + ROY. (A4) 
Hence, 
a”) a (I aad Ke Z”) (Ta;4} = 1) ae ee ie = Zia). (A5) 
The prediction errors (7@‘4) — q,) are independent of the residuals (Y{4) — Z{4)q,) and so, 


PIA) = EL — 21) (G4 — )'] = WPI LO + KIPLIOKP” (6) 


where we denote for convenience Q, = (I — K{’) Z}4)). 
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By definition of the matrix F{” (see below 4.3), equation (A6) can be written in the form 
PI? = OPO, — PRR ZO KP + KP FO KE” 
PRR Sy SOK: (A7) 


which implies the relationship (4.4) by straightforward algebra. 
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A Method for the Analysis of Seasonal 
ARIMA Models 


DAVID A. BINDER and J. PETER DICK! 


ABSTRACT 


A commonly used model for the analysis of time series models is the seasonal ARIMA model. However, 
the survey errors of the input data are usually ignored in the analysis. We show, through the use of state- 
space models with partially improper initial conditions, how to estimate the unknown parameters of this 
model using maximum likelihood methods. As well, the survey estimates can be smoothed using an 
empirical Bayes framework and model validation can be performed. We apply these techniques to an 
unemployment series from the Labour Force Survey. 


KEY WORDS: Kalman filter; Partial likelihood; Data smoothing. 


1. INTRODUCTION 


It is common practice to analyze data from surveys where similar data items are collected 
on repeated occasions, using time series analysis methods. Most standard methods for these 
analyses assume the data are either observed without error or have independent measurement 
errors. However, in the analysis of repeated survey data, when there are overlapping sampling 
units between occasions, the survey errors can be correlated over time. 

A commonly used model in the analysis of time series is the seasonal integrated 
autoregressive-moving average (ARIMA) regression model, which we discuss in this paper. 
We show how to incorporate the (possibly correlated) survey errors into the analysis. In par- 
ticular, we consider the case where the survey (design) error can be assumed to be an ARMA 
process up to a multiplicative constant. 

When such a model for the behaviour of the population characteristics is assumed, the 
minimum mean squared error, or, equivalently, the Bayes linear estimator for the characteristic 
at a point in time can be derived. This estimator incorporates the model structure which the 
classical estimators, such as the minimum variance linear unbiased estimators, ignore. When 
the model parameters are estimated from the survey data, the estimators are empirical Bayes. 

Blight and Scott (1973), Scott and Smith (1974), Scott, Smith and Jones (1977), Jones (1980), 
Rao, Srinath and Quenneville (1989) and others considered the implications of certain stochastic 
models for the population means over time. Hausman and Watson (1985) incorporate a 
measurement error model into the standard seasonal adjustment process. Miazaki (1985) 
assumed that the survey error could be modelled with a pure moving average process. In Binder 
and Dick (1989), these results were generalized using state space models and Kalman filters. 
In this paper, we extend the framework to include the model where differencing of the original 
series of the population means yields an ARMA model. We use the modified Kalman filter 
approach given by Kohn and Ansley (1986). To estimate the unknown parameters, we max- 
imize the marginal likelihood function using the method of scoring. This approach can also 
handle missing data routinely. We also show how the survey estimates can be smoothed to incor- 
porate the model features using empirical Bayes methods. Confidence intervals for these 
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smoothed values are also given, using the method described by Ansley and Kohn (1986). Bell 
and Hillmer (1987) used a similar model but their initial conditions do not extend easily to 
include regression terms or missing values (while preserving the marginal likelihood approach). 

An example of this model is described in Section 5 using unemployment data from the 
Canadian Labour Force Survey. This example shows the implications on the estimates of the 
model parameters when the survey errors are taken into account. We derive a smoothed estimate 
of the underlying process under the model assumptions. Recursive residuals are produced and 
validation techniques are used to evaluate the various models. 


2. THE MODEL 


Suppose we have a series of point estimates from a repeated survey of a population 
characteristic, given by y;, y2, ..-, Yr. We assume that y, can be decomposed into three 
components, so that 


Y= Xiy + Oy + &, (2.1) 


where x/y is a deterministic regression term, 6, is a population parameter following a time 
series model, and e, is the survey error, assumed to have zero expectation. 

We first describe an integrated seasonal autoregressive-moving average model for {6,}. We 
let B be the backshift operator; V = 1 — Band V, = 1 — B®, where sis the seasonal period. 
We define the following polynomial functions: 


N(R) = 1 — NA lB — = Nee 

a(B) = 1 — a,B— aB? — ... — a,B?, 

v(B) = 1 — vB — 1B? — ... — vgB2, 
and 

8(B) = 1 — B,B — 6,B* — ... — B,B?. 


The seasonal ARIMA (p,d,q)(P,D,Q), model for {0,} is given by 
\(BS)a(B)V4 V2; = v(B*)B(B)E;, (2.2) 


where the €,’s are independent N(0,07). We define a(B) = \(B‘)a(B), a (p + sP)-degree 
polynomial; A(B) = Viv, a (d + sD)-degree polynomial; b(B) = v(B*)B(B), a 
(q + sQ)-degree polynomial; A(B) = a(B)A(B), a (p + d + sP + sD)-degree poly- 
nomial; u, = A(B)6;,an ARMA(p + sP,q + sQ) process. Therefore, alternative represen- 
tations of (2.2) are 


a(B)A(B)6; = D(B)é,, (2.3) 


A(B)6, = bD(B)é;, (2.4) 
and 
a(B)u, = b(B)€,. (2.5) 
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We now consider the survey errors {e,} of expression (2.1). It will be assumed that the 
sample sizes of the repeated survey are sufficiently large that the errors for the survey estimates 
can be approximated by a multivariate normal distribution. In the simplest case, where the 
surveys are non-overlapping and the sampling fractions are small, the e,’s can be assumed to 
be independent. In a rotating panel survey, the survey errors are usually correlated. In this case, 
since the correlations between survey occasions are zero after panels have been rotated out, 
a pure moving average process can be used to describe the survey error process. 

Alternatively, if a random sample of units are replaced on each survey occasion, a pure 
autoregressive process may best describe the process. More complicated models are also 
possible. For example, in a two-stage design, some of the first stage units may be replaced ran- 
domly on each occasion and the second stage units may have a rotating panel design. This might 
be approximated by an autoregressive-moving average process, as suggested by Scott, Smith 
and Jones (1977). 

In this paper, we assume that the survey error process is given by 


€p = Kya, (2.6) 


where {w,} is an ARMA (m,n) process, given by 


o(B)w, = ¥(B)n (2.7) 
and 
$( BY =| ipld pB (=) O28 it o> Oy B”, 
and 
¥(B) = 1 — WB — YB? — ... — ¥,B". 


The n,’s are independent N(0,7*). The factor k, has been included in (2.6) to allow for non- 
homogeneous variances when the autocorrelation function is homogeneous in time. 

In the model just described we assume that 7, the k,’s and the coefficients of ¢(B) and 
of ¥(B) can be estimated directly from the survey data, using design-based methods. How- 
ever, in general, the other parameters are unknown. This includes y, o”, and the coefficients 
of \(B), a(B), v(B) and of 6(B). The x,’s in the regression term are assumed known. 


3. STATE SPACE FORMULATION OF THE MODEL 


3.1 General Formulation 


The model described in Section 2 can be formulated as a state space model with partially 
improper priors. This has a number of advantages. It permits, through use of a modified 
Kalman filter, calculation of a marginal likelihood function, which can be maximized to 
estimate unknown parameters. It also accommodates smoothing of the original survey 
estimates, by removing the estimates of survey error from the data. 

In the state space model, two processes occur simultaneously. The first process, the obser- 
vation system, details how the observations depend on the current state of the process 
parameters. The second process, the transition system, details how the parameters evolve over 
time. 


242 Binder and Dick: Analysis of Seasonal ARIMA Models 


For the state space models we consider here, the observation equation is written as 
yy, = hi (3.1a) 


and the transition equation is 


£, = Fe, + Gé,, (3.1b) 


where Zz, is an (r X 1) state vector and h, is a fixed (r X 1) vector. In the transition equa- 
tion, F is a fixed (r X r) transition matrix, G is a fixed (7 X m) matrix and the &,’s are 
independent normal vectors with mean zero and covariance U. 

The final requirement to complete the specification of the state space process is the initial 
conditions for Z». In this paper, we shall use the improper prior formulation given in Kohn 
and Ansley (1986). In general, we assume that Z9 has a partially diffuse r-variate normal dis- 
tribution with mean m(0 | 0) = 0 and covariance matrix V(0 | 0), where 


V(0 | 0) = «KV,(0 | 0) + Vo(0 | 0) (3.2) 


for large x. The matrix V,(0 | 0) specifies the diffuse part of the prior. We explain in Section 
3.2 how to obtain V,(0 | 0) and V(0 | 0) for our model. 

We denote the conditional mean of z, given the observations up to and including time t’ 
by m(t | t’), and the conditional variance by V(t | t’), where 


V(t| t') = «V(t | t’) + Vo(t | t’). (3.3) 


Recursive formulae for the cases where t = ¢’ andt = t’ + 1 are givenin Kohn and Ansley 
(1986). They refer to this as the modified Kalman filter. 

Since the model for {y,} given by (2.1) contains survey errors {e,} an estimate of the com- 
ponents without survey error, given by 


y; (smoothed) = x;y + 6; (3.4) 


is often of interest. When the right hand side of (3.4) can be expressed as g/ z,, for some g; , 
then it is possible to obtain the conditional mean and variance of the linear combination g/ z, 
given all the data, using the modified Kalman filter. To do this, the recursions are applied up 
to time ¢ to obtain m(t | t) and V(t | ¢). Then the state vector z, is augmented by the state 
Zer+1 = 8/2, and m(t | t) and V(t | t) are also appropriately augmented. The matrix F in 
(3.1b) is modified to add the equation z,41,-+; = 2;,-41- After these modifications, the 
modified Kalman filter can be used as before, so that the last component of m(T | T) gives 
the conditional expectation of g/z,, given all the data, y,, y2, ..., yr. As well, the last 
diagonal component of V(t | t) gives the conditional variance. This procedure can be 
generalized to include any number of smoothed estimates and their conditional covariances. 
In applications, space limitations on the computer might preclude computing the smoothed 
values for a large number of time points. 


3.2 Model for 0 


Harvey and Phillips (1979) described a method to put the ARIMA model (2.4) into the 
state space form given by (3.1). The dimension of z, is r = max(p + d + sP + sD, 
q + sQ). By augmenting A = (Aj, ..., Apiaispsp) Or b = (di, ..., Dg4sQ) with zeroes 
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to have dimension r, the ARIMA model may be written in the form given by (3.1), where 
hy = (1,,0,... 50), Gp = (l= By, sic 0,-4) and 


A, 
= ; I, 
sit A,_| 
A, | 0’ 


where J,_; is the (r — 1) by (r — 1) identity matrix and 0’ is a row vector of zeroes. 
In this formulation, the state vector z, = (Z1;, ..., Z,)’ is defined as 
Ge LF Pia 2: I Pe i SA A,6;_ (r—i+1) 
sect has hens ON ps pt pee ald HP | Fo po Fh (3.5) 
fOEd =. 2, 35)... rrandrz)7—.0; 


To complete the specification for {0,}, initial conditions for Z) are required. These are given 
in Ansley and Kohn (1985), a summary of which is provided here. 


From expression (2.5), {u,} is an ARMA process. We define 
GSU06, ORES FEO pe 
where S = max(0, p + sP + d + sD — 1). We let 
u_ = (Up, U_j, ..., U_R)’; 
where R = max(0, p + sP — 1). Finally, we let 
Warm Ota 510 treo.» 1,\8 25) & 
when S > R. 


Now, u_ is assumed to be a stationary ARMA process, so that its covariance matrix can 
be derived from expression (2.5). It is assumed that w_ is N(0, «J) and is independent of u_. 
Since (u_’, w_’)’ isanon-singular linear combination of 6_, the covariance matrix for @ can 
be derived. Using the form of expression (3.5) for Zo, the initial covariance matrix can be 
computed. Note that when both d and D are zero, so that no differencing takes place in the 
model, then w_ is the null vector and we have u_ = 6_. 


3.3. Model for the Observed Data 


In Section 2 we assumed that e, = k,w,, where w, is an ARMA(m,n) model. Therefore, 
from the discussion in Section 3.2, it is clear that e, can be represented in state space form, 
with h; = (K,, 0, «.,., 0)’, and.e, = h/z,. 

The regression component can be similarly represented by adding 7 to the state vector and 
initially, assuming that yy has mean zero and covariance «J. Note that in the transition equa- 
tion 7 remains constant. 
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Since we can represent each of the components of y, in expression (2.1) by a state space 
model, it is straightforward to combine the individual models into an overall model, by exten- 
ding the state vector to include the state vectors from the individual components. The obser- 
vation equation is then the sum of the three individual components. 


4. ESTIMATION OF THE STATE SPACE MODEL 


4.1 Estimation of the Parameters 


The unknown parameters of this model are o”, and the coefficients of \(B), a(B), v(B) 
and 8(B). We transformed o” to log(o’), in the numerical maximization procedure described 
below to avoid problems with negative parameter values. The model for the vector of obser- 
vations y = (jj, Yo, ..-, Yr)’ given in Section 3 is equivalent to 


Ve Ms, (4.1) 


where 7 is j-variate N(0, xJ), ¢is T-variate N(0, W), and Mis some fixed T x j matrix. We 
note that 7 contains unknown constants including the regression coefficients; Wis a function 
of the ARMA parameters; M is a function of the differencing structure. 

Kohn and Ansley (1986) recommended maximizing the limit of x//* times the likelihood 
function for the data, as x tends to infinity. It can be shown that this limit of the likelihood 
function is equivalent to the marginal likelihood function of y — My, where 7 is the maximum 
likelihood estimate of n when Mand Ware known. Tunnicliffe-Wilson (1989) has shown that 
the Jacobian of the transformation from the data y to (7, y — My) does not depend on the 
model parameters of W whenever M is known. Ansley and Kohn (1985) have shown that M 
does not depend on the unknown parameters. By using the modified Kalman filter, the com- 
putations for the marginal likelihood function are more straightforward than the approach 
given by Tunnicliffe-Wilson. 

The procedure we employed computes both the marginal likelihood function and its first 
derivatives with respect to the unknown parameters. This involves taking first derivatives 
of the initial conditions and of m(t | t’) and the components of V(t | ¢t’) for t = t’ and 
t = t’ + 1. All the computations were done using PROC IML in SAS. 

The likelihood function was maximized using a modification of the method of scoring. This 
modification allowed for varying step sizes. On each iteration, the likelihood function was 
computed at the previous step size, as well as at this step size multiplied and divided by a 
predetermined constant. (We used 1.1 as the factor.) The next step size was to choose the point 
which maximized the likelihood function among the three points. Each time a check was made 
to determine whether the parameters were in range. This was done by checking for positive 
semi-definiteness of the initial covariance matrix of the state vector. If it was out of range, 
the step size was divided again by the constant and the procedure repeated. 

To estimate the variance matrix for the estimated parameters, the inverse of the Fisher infor- 
mation matrix was used. This is readily computed since the first derivatives of the likelihood 
function are available. 


4.2 Estimation of the Smoothed Values 


Smoothed values as defined in (3.4) for the estimates can be obtained by zeroing out that 
component of the state vector which corresponds to the survey error. However, this still leaves 
open the question of how to estimate its variance. To derive the standard error of the smoothed 
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estimate it is necessary to account for the fact that the unknown parameters have been estimated 
from the data, particularly when the data series is short; see Jones (1979). 

To obtain the variance of g’z,, it is sufficient to derive the variance zr — m(T | T), where 
m(T | T) isthe estimate of m(T | T) at the estimated parameter values. This is because the 
state vector has been augmented to include g’z,. Now, 


Z7 — m(T.| T) = [z7 — m(T | T)] 
+ [m(T|T) — m(T| T)}. (4.2) 


The first component of the right hand side of (4.2) has conditional variance 
V(T | T) = Vo(T | T), assuming that V;(T | T) = 0. The second component of (4.2) 
represents a bias term and is independent of the first term, since it depends only on the data 
y. By taking a Taylor series expansion of the second term around the true parameter values 
and ignoring higher terms, we have the second component of (4.2) is 

m(T | T)— m(T | T) = as (gems), (4.3) 


where ¢ is the vector of unknown parameters and ¢ is its estimate. Therefore, the asymptotic 
variance of (4.2) is approximately 


Var[z7 — m(T| T)] = Vo(T | T) 


F ee 2] v,[@e ). an 
a¢ ao 


where V, is the covariance matrix for the unknown parameters. Expression (4.4) is estimated 
by using the estimated parameter values. This is the same approach as that given by Ansley 
and Kohn (1986). 


4.3. Generalized Recursive Residuals 


As Harvey and Durbin (1986) pointed out, useful quantities for performing model 
diagnostics are the generalized recursive residuals. In terms of our state space model, this is 
the difference between the observation and the one-step ahead prediction from the Kalman 
filter. These can be used for all time points ¢ where V,;(¢ + 1 | t) = 0. Under the model, 
these residuals are approximately independent normal. They can be standardized to have an 
estimated variance of unity under the model. Diagnostics similar to those used in classical regres- 
sion models can then be performed. 


5. ANALYSIS OF LABOUR FORCE DATA 


5.1 Parameter Estimation 


To demonstrate this procedure, we take data from the Canadian Labour Force Survey (LFS). 
The LFS is a monthly rotating panel survey with each panel containing one-sixth of the selected 
households. A panel will remain in the sample for six consecutive months while the primary 
sampling units will rotate out after approximately two years. The sample selection follows a 
stratified multi-stage design. 
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The data were the monthly number of unemployed as published from January 1977 to 
December 1986 for the province of Nova Scotia and for the subprovincial region within Nova 
Scotia corresponding to Cape Breton Island. This province was selected because the sampling 
errors are moderate compared to the larger provinces. Cape Breton Island was selected because 
its smaller sample size provides estimates with a larger relative variance. Graph la displays 
the logarithm of the Nova Scotia series and Graph 1b shows the similarly transformed Cape 
Breton Island series.. We used the logarithms as our inputs. 

Lee (1990) estimated the autocorrelations for the Nova Scotia survey error up to a lag of 
eleven. We derived the coefficients of the ARMA (m,n) survey error process given in (2.7) 
by matching these autocorrelations. A good fit was found using an ARMA (3,6) model. The 
resulting coefficients were: 


Pye = 21 022575 ol Yj = 081847 
od, = —0.3580 yy. = —0.5873 
6; = —0.6041 yy, = 0.3496 
v4 = 0.0647 
7 = 0.7246 ys, = 0.0982 
ve = 0.0347. 


The k;,’s of (2.6) were the estimated standard errors of the estimates, derived by taking a 
Taylor series approximation for the logarithms. 

A series of models were fitted to the Nova Scotia data with an assumption of no sampling 
error. The same models were then refitted, incorporating the model for the survey error process. 
In this case we could also compute smoothed values for the survey estimates and compare their 
standard errors with the standard errors of the original series. 

The preliminary model selected for the Nova Scotia data, ignoring the sampling error, was 
a seasonal ARIMA (1,1,0)(0,1,1),.. However the moving average term for the seasonal com- 
ponent was estimated to be one, so a deterministic regression term was used to account for 
the seasonality. The 12 regression variables included a linear term and a dummy variable for 
each of the first 11 months. The dummy variable for a reference month took the value 1 for 
the reference month, —1 for December and 0 for the other months. Note that an intercept 
term is not appropriate for this model because the first differences of the data are fitted. 

Further analysis of this reduced model showed that the moving average seasonal compo- 
nent was not required in the model. The final model selected for the Nova Scotia data was an 
ARIMA (1,1,0) with a deterministic regression component. This same model was then used 
for the Nova Scotia data with the survey error process incorporated. The same structural model 
was used for the Cape Breton Island series. 

Table 1 displays the parameter estimates. The estimates that do not incorporate the survey 
error component are in the Without Sampling Errors columns. First, examining the models 
for Cape Breton Island shows that the regression estimates are similar, as would be expected. 
Note that the autoregressive estimates (AR) are also similar and that the With Sample Error 
model has reduced the estimated model variance substantially. The column headed 7-value 
displays the estimated parameter divided by its standard error. Note that the ¢-values for the 
autoregressive parameter are substantially different (— 0.68 vs — 2.85). This would lead to 
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Table 1 


Parameter Estimates - Unemployment Series 1977-1986 
ee Ee Ee ee a eee eee ee ee 
Nova Scotia Cape Breton Island 


Parameter Without Sampling With Sampling Without Sampling With Sampling 
Error Error Error Error 


Estimate 7-value Estimate 7-value Estimate 7-value Estimate T7-value 
i ee ee ee ee ee a 


Alpha — 0.296 = 3323 0.862 2.08 — 0.260 —2,9D — O23! — 0.68 
Sigma 0.0597 - 0.0032 - 0.1049 - 0.0520 - 
Trend 0.00427 1.01 0.00420 1.89 0.00607 0.79 0.00598 1.50 
January 0.064 3.60 0.048 1.93 — 0.007 = 0.23 — 0.003 0.10 
February 0.083 4.80 0.078 3.30 0.027 0.89 0.028 0.97 
March 0.166 10.20 0.165 6.40 0.171 5.76 0.164 5.76 
April 0.106 6.60 0.104 4.10 0.099 3.53 0.089 3.19 
May 0.009 0.60 0.016 0.70 — 0.008 —0:28 — 0.007 — 0.24 
June — 0.101 — 6.00 — 0.088 = 3230 — 0.029 — 0.96 — 0.033 = ey 
July — 0.016 = 1°20 — 0.014 — 0.63 0.082 2.14 0.081 Ie IE 
August — 0.058 — 3.60 — 0.062 eel — 0.011 3037 — 0.009 — 0.30 
September —0.106 — 6.60 = (0.105 — 3.96 — 0.104 351 — 0.098 =3.15 
October — 0.081 — 4.80 — 0.071 — 3.08 — 0.084 — 2.03 — 0.069 —2.44 
November -—0.026 — 1.80 =0.029 = 10s — 0.063 =2,10 — 0.074 — 2.46 


accepting a model for the Cape Breton Island data with only a deterministic regression term 
when the survey error process is incorporated into the model. However, if the survey error is 
ignored in the analysis, too much significance would be attached to the autoregressive 
parameter. 

The results for the Nova Scotia models are also displayed on Table 1. Note that the reduc- 
tion in the estimate of the model variance by incorporating the sampling error structure is much 
greater for the Nova Scotia series than was achieved for the Cape Breton data. An important 
result in the Nova Scotia models is the difference in the estimates for the autoregressive com- 
ponent. Both models show that the AR component is highly significant in each model. The 
Without Sample Error model gives an estimate of ~« = — 0.296; whereas the With Sample Error 
model gives an estimate of a = 0.862. Clearly, the interpretations that would be associated 
with these two estimates are entirely different. 

The smoothed estimates for the model incorporating sampling error are shown superimposed 
on the original data series in Graph 1a. Graph 1b shows the smoothed estimates for Cape Breton 
Island superimposed on the original series. The most notable item in these plots is the impact 
of the recession of 1981 on the smoothed estimates. Prior to the recession, the model tends 
to overestimate unemployment and after 1981 the model tends to underestimate the number 
of unemployed. 


5.2 Model Validation 


The plots of the generalized recursive residuals (described in Section 4.3) against the lagged 
generalized recursive residuals were produced for all the models. Graphs 2a and 2b show these 
plots for the two models for Nova Scotia. Note that Graph 2a shows less dispersion around 
the origin than Graph 2b, indicating a better fit when survey error is incorporated in the model. 
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Graph la Nova Scotia Observed and Smoothed Values (Log Transform) 
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Graph 1b Cape Breton Island Observed and Smoothed Values (Log Transform) 
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Lagged residuals 


Residuals 


Graph 2a Nova Scotia One Step Ahead Prediction Errors — Survey Error Included 


Lagged residuals 


Residuals 


Graph 2b Nova Scotia One Step Ahead Prediction Errors — Survey Error Ignored 
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Lagged residuals 


Residuals 


Graph 3a Cape Breton Island One Step Ahead Prediction Errors — Survey Error Included 


Lagged residuals 
3 


Residuals 


Graph 3b Cape Breton Island One Step Ahead Prediction Errors — Survey Error Ignored 


251 


Survey Methodology, December 1990 


Mar. 80 Mar. 81 Mar. 82 Mar. 83 Mar. 84 Mar. 85 Mar. 86 


Mar. 79 


Graph 4a Nova Scotia CUSUM of One Step Ahead Prediction Errors 
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Graph 4b Cape Breton Island CUSUM of One Step Ahead Prediction Errors 
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The same plots for Cape Breton Island are shown in Graph 3a and 3b. There is a striking 
similarity in the resulting residual plots for the two models from Cape Breton. However, none 
of the four plots give any compelling reason to doubt the underlying normal assumption of 
any of the models. 

To test that the models did not undergo a structural change, the recursive residuals can be 
cumulatively summed to create a CUSUM chart. Whereas using the tests described in Brown, 
Durbin and Evans (1975) produced no significant results, the chart does suggest some struc- 
tural change may be occurring. The CUSUM for Nova Scotia, as displayed in Graph 4a, shows 
quite clearly that prior to the recession the residuals are generally negative, implying that the 
model predictors are too large. During the 1981 recession the model produces mainly positive 
residuals. This implies that the model predictors are too small. The CUSUM for the Cape Breton 
Island models is shown in Graph 4b. Here we can see that the model that includes the survey 
error undergoes an earlier structural change. 

We see, therefore, that model improvements can be made. By incorporating an extra regres- 
sion variable corresponding to the structural changes noted in the CUSUM chart, further 
analysis can be performed within the same general framework. The form of such a variable 
is currently being investigated. 


5.3 Summary 


These examples demonstrate the importance of accounting for survey errors in certain time 
series analyses. Using the modified Kalman filter, we have developed a flexible method for 
parameter estimation, data smoothing and model validation for a wide variety of commonly 
used models. 
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Spatial-Temporal Modelling of Spatially Aggregate 
Birth Data 


DAVID R. BRILLINGER! 


ABSTRACT 


Births by census division are studied via graphs and maps for the province of Saskatchewan for the years 
1986-87. The goal of the work is to see how births are related to time and geography by obtaining con- 
tour maps that display the birth phenomenon in a smooth fashion. A principal difficulty arising is that 
the data are aggregate. A secondary goal is to examine the extent to which the Poisson-lognormal can 
replace for data that are counts, the normal regression model for continuous variates. To this end a 
hierarchy of models for count-valued random variates are fit to the birth data by maximum likelihood. 
These models include: the simple Poisson, the Poisson with year and weekday effects and the Poisson- 
lognormal with year and weekday effects. The use of the Poisson-lognormal is motivated by the idea 
that important covariates are unavailable to include in the fitting. As the discussion indicates, the work 
is preliminary. 


KEY WORDS: Aggregate data; Borrowing strength; Contouring; Extra-Poisson variation; Locally- 
weighted analysis; Maps; Periodogram; Poisson distribution; Poisson-lognormal 
distribution; Random effects; Spatial data; Time series; Unmeasured covariates. 


1. INTRODUCTION 


The concern of this work is spatial-temporal data, that is quantities recorded as functions 
of space and time. The analysis of such data should be ‘‘easy’’ because of the graphing 
possibilities, e.g. rate versus time or effect versus geography, in the manner of residual plots 
so often employed in regression analysis; however in the present case the aggregation of basic 
elements leads to substantial difficulties. 

The specific data studied consists of daily births for the calendar years 1986 and 1987 to 
women aged 25-29 for each of the 18 census divisions of the province of Saskatchewan. The 
corresponding population sizes, as determined in the 1986 Census, are also employed in order 
to compute rates. The reason that Saskatchewan was selected for this pilot study is that it is 
moderate sized and its boundaries and those of its census divisions are fairly regular. (The latter 
was important at the early stages of the work because computer based maps were then 
unavailable). Women aged 25-29 were selected because that was the 5 year age group with most 
births. These data were provided to the author by Statistics Canada. They are characterized 
by being aggregate, by being non Gaussian and by being non stationary in space and time. 

It is wished to understand the relationship of births to time and geography, specifically to 
allow temporal and spatial patterns of fertility and possible surprises to show themselves. There 
are two central aspects to the study; a locally-weighted analysis of aggregate data is developed 
and random effects models are set down and fit to handle extra-Poisson variation. The latter 
part may be viewed as an inquiry into the flexibility of the Poisson-lognormal to handle 
unmeasured covariates and errors. The locally weighted analysis proceeds by developing 
weights, w;(x,y), that are meant to reflect the influence of the i-th census division (an 
aggregate) on the point location with coordinates (x,y). Given census division data, these 
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Figure 1. Top: Time series of annual births to women aged 25-29 in 1986 for the Province of Saskatchewan. 
ottom: Periodogram of the square roots of the count graphed above. The solid lines provide approximate 
95% marginal confidence limits. The peak corresponds to a period of 7 days. 
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weights are then applied to individual terms of the log-likelihood or corresponding estimation 
equations and parameter estimates evaluated. 

It is to be emphasized that this is a preliminary report on work in progress. For example 
the fine structure of the data is not taken advantage of and no measures of uncertainty of the 
various estimates have been provided. The expressions employed for the weights, in this present 
work, are naive and bound to change form with further study, but the character of the analysis 
may be anticipated to remain of some interest. 

The companion paper Brillinger (1990) considers some aspects of the spatial case alone. 


2. BIRTHS AS A TIME SERIES 


The top graph of Figure 1 provides the total number of births in Saskatchewan for each 
day of 1986. The dashed line is the 1986 mean level. The solid line is the result of heavily 
smoothing the series and is meant to highlight any trend. This graph does not, with casual 
inspection, provide striking evidence of any special phenomenon. However when the 
periodogram of the square root of the counts is computed, see bottom graph of Figure 1, 
something of interest appears. (The square root is employed to make the series more nearly 
symmetrical and normal). The upper and lower solid lines on the graph provide approximate 
95% marginal confidence limits about a heavily smoothed version. A peak is apparent at a 
frequency of .143 cycles/day corresponding to a period of 7 days. This periodic phenomenon 
is well known in the analysis of birth data, see e.g. Cohen (1983) and Miyaoka (1989) and 
references therein. It is usually ascribed to doctors intervening in the natural process of labour 
and inducing births particularly on weekdays. 


3. BIRTHS AS A SPATIAL PROCESS 


Figure 2 provides, for each census division, and for women aged 25 to 29 the annual rate 
of births for the years 1986 and 1987 combined. One sees the highest rate of .208 births per 
woman per year to occur in the northern half of the province while the two lowest rates appear 
in the census divisions containing Regina and Saskatoon. 

Figure 3 provides the numerical difference between the annual rate for 1987 and that for 
1986 for each of the 18 census divisions. (Note that the 1986 census population has been taken 
as the divisor in each case). The differences are scattered around 0. It is to be noted that these 
rates are, however, based on fairly widely varying population sizes. 

In the previous section the presence of a phenomenon of period 7 days was noted. Figure 4 
presents the difference between the average weekday rate and the average weekend rate, 
(weekdays meaning Monday through Friday) for each census division. In all but one census 
division, the weekday rate is higher. This is consistent with various other studies and, as sug- 
gested in Section 2, is very possibly due to doctors inducing labor on weekdays (to avoid births 
on weekends). 

The various rates presented in Figures 2, 3, 4 are average values for individual census 
divisions. 


4. PROBLEMS ARISING 


Maps of most quantities of direct interest that assign average values to the wholes of 
counties thereby lie, lie, lie. 
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With these graphic words Tukey (1979) deplores the use of maps such as those of Figures 
2, 3, 4 that are constant across geographic divisions. Indeed examination of Figure 2, as does 
common knowledge, suggests that the birth phenomenon quite likely varies smoothly across 
census division boundaries. A principal concern of this work is to develop contour maps dis- 
playing smooth variation. It is hoped that such maps will prove useful in the discovery of general 
stochastic descriptions of the phenomenon and will allow insightful exploratory analyses. 

A second concern of this work is with the statistical distribution of the counts themselves. 
A natural special stochastic model to employ is the Poisson. Yet in past studies the birth process 
has been found to relate to many socio-economic quantities, e.g. diet, lifestyle, weather, 
environment, weekday, holidays, age structure. Further the population of the various census 
divisions has varied around the Census Day values throughout 1986-87 and lastly the women’s 
ages are scattered from 25 to 29. In summary it seems necessary to employ a more flexible model 
than the Poisson, specifically a model able to handle omitted covariates. The Poisson-lognormal 
will be employed in this work. As a sideline, due to the presence of the standard deviation 
parameter in the Poisson-lognormal, there will be a borrowing of strength that takes place in 
combining the data values, in the manner described by Mallows and Tukey (1982). (The term 
‘‘borrowing strength’’ is employed, rather than for example ‘‘empirical Bayes’’ as some might 
prefer, because it has been in use for a substantial time period and because of its broader implica- 
tions). Dean et al. (1989) is another recent reference concerned with handling extra-variation. 


5. LOCALLY-WEIGHTED ANALYSIS 


In the case of nonaggregate data, locally-weighted fitting is a convenient fashion by which to 
estimate smoothly varying quantities. Suppose one has a variate Y with probability distribution 
pD(Y | ©) depending on the finite dimensional parameter 8. Suppose one wishes an estimate 
of 9 particular to the location with coordinates (x,y). Suppose the datum Y; is available for 
location (x;,y;). Let W;(x,y) be a weight dependent on the distance of (x;,y;) to (x,y). 


Consider estimating 6 by maximizing the weighted log-likelihood 


y) Wi(x,y) log p(¥; | ©) (1) 
i 
or (often equivalently) by solving the system of estimating equations 


yy Wi(xy) ¥(¥ | 6) = 0 (2) 


I 


with V¥(Y | 6) = dlogp/d90, the score function. 


To illustrate the technique consider an elementary case, specifically take Y to be normal with 
mean p and variance o7. The locally weighted estimate of: at (x,y) results from minimizing 
WiC Yyeanae 

i 
and is given by 


(xy) = )) Wilxy) x yy Wisy); 
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Figure 2. The average annual birth rate for women Figure 3. The 1987 rate minus the 1986 rate for the 
aged 25 to 29 for the years 1986 and 1987, same data as Figure 2. 
plotted above census divisions. ‘‘R”’ and ‘‘S’’ 
indicate the locations of Regina and Saskatoon 
respectively. 


Figure 4. The average weekday rate minus the average Figure 5. The weights, W;(x,y) applied in equations (1) 
weekend rate for the same data as Figure 2. or (2), computed via expression (4), for four of 

the census divisions. The weights are not shown 

for all the divisions in the interests of clarity. 

The contours at levels .50 and .99 are shown. 
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an expression with intuitive appeal. It is to be noted that such formulas are commonly used 
in computer graphics as interpolation procedures, see for example Franke (1982). 


Among references we may mention Gilchrist (1967) concerned with ‘‘discounting’’, Pelto 
et al. (1968), concerned with least squares, Cleveland and Kleiner (1975), who suggested the 
use of moving midmeans and Stone (1977) focusing on regression. In the discussion of Stone’s 
paper, Brillinger (1977) suggested the form (2) for a general distribution and justified it as a 
Bayes’ rule. Specifically consider the loss function 


L(Y | 9) = —logp(Y | 9). 
Suppose an estimate is desired at r = (x,y). The Bayes’ risk may be written 
E{L(Y | 0,)} = E{E{(L(Y | 9,) | r}}. 


Bayes’ rule seeks 


min E(L(¥ | ©) |r}. 


With data Y;,r;, and W,(r) a kernel centred at r;, one approximates the conditional expected 
value here by 


E{logp(Y | ®) |r} ~ )) Wi(r) logp(¥ | ©) 


and so is led to expression (1). 


Tibshirani and Hastie (1987) develop an equi-weighted local likelihood estimation procedure. 
Cleveland and Devlin (1988) develop the least squares approach in real detail. Staniswalis (1989) 
studies and implements the general p case. Advantages of the locally-weighted technique 
include: no ‘‘hidden model’’ distribution assumption, the possibility of discerning non- 
additivity, variants for resistance and influence, simple additivity of the observation compo- 
nent, and no matrix inversion (as, for example, kriging requires). 

The birth data of concern in this work is aggregate (or grouped) totals over census divisions. 
The procedure of the preceding section cannot therefore be employed directly. The problem 
is that of obtaining appropriate weights w;(x,y) evidencing the effect of the census division 
ion the location (x,y). Suppose | R; | denotes the area of census division i. Then the naive 
weight function is 


1 
wi(x,y) = Ri i for (x,y) in R; 
i 


and equal 0 otherwise. In this work functions of the essential form 


wi(x,y) = ee | W(x — uy — v)dudv (3) 
|Rile 


will be employed where W(-) is a kernel appropriate for the nonaggregate case as for example 
studied in Cleveland and Devlin (1988). The formula (3) may be motivated by consideration 
of the Poisson point process case, see Appendix II. Estimates will be determined via the criteria 
(1) or (2) with W; replaced by w;. 
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The specific weights employed at r = (x,y) in this preliminary work are 
w;(r) = exp{ — (1 — p)7 Ilr — r,ll?/277} (4) 


outside the ellipse (rp — ¥;)S;'(ro — r;)’ = d = 5.991 and equal 1 inside. Here Irll* = 
x? + y*, p = dolV(r — F)S;1(r — £))’ and r = .025, where fr; = E U; and §; = varU, 
with U; a variate uniformly distributed within R;. This choice of p makes the weight function 
continuous. The logic is that the census divisions are approximated by ellipses with the same 
mean and variance-covariance matrix. (The specific values were chosen after a bit of experimen- 
tation, in part to make the area in the initial ellipse about .95 of the division’s). One could have 
employed other shapes than ellipses, e.g. rectangles, but this is preliminary work and it is 
anticipated that later work will employ weights of the form (3). 

Figure 5 displays the .50 and .99 contours of the w;(x,y) plotted for several of the census 
divisions. The contours are seen to follow the general shapes of the census divisions. The jag- 
gedness in some of the contours results from the discreteness of the 40 x 40 grid employed 
in the computations. 

Other weight functions constructed with somewhat similar problems in mind may be found 
in Tobler (1979) and Dyn and Wahba (1982). Advantages of the present approach, as listed 
for the nonaggregate case above include: the terms in (1) or (2) are additive and do not interact, 
no matrix inversion is needed, and resistance to outliers is easily built in. 

Cliff and Ord (1975) Section 5.1, discusses measures of the influence of counties on other 
counties. The concern of this present paper however is the influence of a ‘‘county’’ on a point 
location. It is to be remarked that perhaps the weight, providing the influence, should depend 
on some covariates, e.g. county population. 


6. A POISSON FIT 


Throughout the analysis, the female population aged 25-29 and births to its members will 
be considered. Let i = 1, ..., 18 index census division. Let N; denote the census count of the 
women in the i-th division. (These are the counts for Census Day, 3 June 1986). Let B; denote 
the total number of births to women aged 25-29 in the two years 1986-87. 

Suppose that the probability distribution p(-) of Section 5 is that B;is Poisson with mean 
2N;,u. (The presence of the multiplier 2 is so the parameter p is an annual birth rate). One logic 
for the Poisson assumption comes from the idea that birthdays are random, see Brillinger (1986). 

With the Poisson assumption, the locally weighted estimate of the annual birth rate at loca- 
tion (x,y) is given by 


i(xny) = wiceo) B/S) wi(x,y) Nj. (5) 


These values are computed for (x,y) on a 40 by 40 grid and the corresponding contour plot 
is given in Figure 6. The contours are seen to vary smoothly. This (smoothed) rate varies from 
.14 to .20, with the higher values in the upper half and the lower centred around the Province’s 
most urban part. 

As indicated previously, the data under study has important temporal characteristics. Models 
need to take this into account. In particular the weekly periodicity needs to be handled as well 
as possible trends in population sizes. The following model seems worth considering. Let j be 
an indicator variable with j = 1 if the count is for a weekday and j = 2 if the count is for 
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Figure 6. Expression (5) graphed for the weights of (4) Figure 7. The estimated birth rate exp{ @} obtained by 
with B; the count of births in census division locally weighted fitting assuming that the num- 
i during 1986-87 and N; the corresponding ber of births, B, given the population at risk, 
population count of women aged 25-29. N, is Poisson with mean Nexp{a + 6 + y} 


with the first + sign plus for weekdays and 
minus for weekends and the second + plus 
for 1986 and minus for 1987. 


Figure 8. Plot of the estimated weekday effect B(x,y) Figure 9. The estimated year effect 7(x,y) as per 
obtained as per Figure 7. Figure 7. 
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a weekend. Let k be a second indicator variable with k = 1 for 1986 and k = 2 for 1987. Let 
Bj, denote the corresponding number of births in census division 7. Suppose that Bj, given 
N; is Poisson with mean N;exp{a + 8; + 7x}. 8; is the weekday effect, y, the year effect and 
it will be assumed that 8, + Bo, y; + y2 = 0 to make the model identifiable. If there is no 
weekday effect, then 8,, 8, = 0. If there is no year effect, then y;, y2 = 0. Now, via locally- 
weighted analysis presented in Section 5, one can obtain estimates of a, 6 and y as functions 
of location (x,y). (For simple balance in the computations, only the first 364 = 7 x 52 days 
of each year have been employed). 

Figure 7 provides the estimate exp {&(x,y) } obtained of the annual birth rate. It is interesting 
to note that, relative to the constant rate Poisson model, the contours have expanded out some- 
what from the urban areas. Figure 8 provides the estimated weekday effect, 81 (x,y), obtained. 
In its case there is bulge to the east. These values are quite a different representation from that 
of the naive differences of Figure 4. In particular, now there is a reflection of the differing 
population sizes. The order of magnitude of the B’sis .08 to .13 while @ is order —2.1 to — 1.6. 
Figure 9 provides the estimated year effect, 7,(x,y). Its values vary from —.03 to .03. 
Numerically, the weekday-weekend effect is the larger. 

The just preceding analysis suggests that there are basic variables that can affect birth rates 
and that modelling and analysis needs to take this circumstance into account. 


7. POISSON-LOGNORMAL FITS 


With a multi-dimensional explanatory variable x in hand, a Poisson model that has B of 
mean N exp{x9} might do a good job of explaining the data. Examples of explanatory variables 
include: diet, lifestyle, weather, environment, holidays, population change, age structure, 
vagaries of boundaries. In the present situation, these variables are not at hand. The omitted 
variables in the model will be assumed specifically accumulated into an error variable. It will 
be assumed that, given e, the variate B is Poisson with mean Nyexp{e} and that e is normal 
with mean 0 and variance o?. In the case of this model B is said to have a Poisson-lognormal 
distribution. Some information on this distribution may be found in Shaban (1988). Sometimes 
¢ enters directly from the problem context, see Brillinger and Preisler (1983) for one example, 
but in the present case it is simply assumed present. 

A critical difficulty, that arises in working with a Poisson-lognormal model, is that closed 
expressions do not exist for the probability function. Yet the model is clearly flexible for 
introducing effects and handling unavailable variables. Following the work of Bock and 
Lieberman (1970), Pierce and Sands (1975) and Hinde (1982), one can proceed via numerical 
quadrature. The probability function may be written 


1 
PCY) = 5 | (ve) “exp{ — ve} o(z)dz 


with ¢ the standard normal density, with Y corresponding to B and with v corresponding to 
Nu. To proceed with a data analysis the integral is approximated by a finite number of terms 
involving nodes, z,, and weights, w,, 


| 


D(Y) = 


<i 


i 
, 2, (ve!) Yexp{ — ve} w;. 
~ [=I 


Listings of nodes and weights may be found in Abramowitz and Stegun (1964) for example. 
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Figure 10. A plot comparable to Figure 7, except that Figure 11. A plot comparable to Figure 8, except now 
now a normal error term is added to the (as in Figure 10) a normal error term has 
linear predictor. been added to the linear predictor. 


Figure 12. A plot comparable to Figure 9, except now Figure 13. The estimated standard error, ¢(x,y), of the 
(as in Figure 10) a normal error term has normal term added to the linear predictor. 
been added to the linear predictor. 


Survey Methodology, December 1990 265 


Figures 10, 11, 12, 13 provide the results of fitting the Poisson-lognormal model including 
weekday and year effects and employing L = 5 nodes. The model assumes Bij, given N; and 
Z is Poisson with mean 


N;exp{a ata B; siaany iewets oZ} 


Z denoting a standard normal deviate and further assumes the separate Z’s independent. Here i 
indexes census division, j weekday or not and k year. Figure 10, a contour plot of exp{&(x,y) }, 
again shows a dip around the urban region as in Figure 7. The irregularity in the figure sug- 
gests that in one case perhaps the estimation procedure converged to a local extremum. Figures 
11 and 12 similarly provide B (x,y) and (x,y). There are again suggestions of local extrema. 
Figure 13, a contour plot of é(x,y), is not easily described. It suggests that the estimate, 6, 
is fairly variable. The estimate is seen to be of order of magnitude .1 and so comparable to 
the weekday effect of Section 6. 

All the work on estimation with the Poisson-lognormal, that we know about, involves some 
form of approximation. For example Clayton and Kaldor (1987) approximate the conditional 
Poisson log-likelihood by a quadratic and Aitchison and Ho (1989) also employ numerical 
integration, albeit after a transformation of the parameters. A new type of approximation has 
recently been proposed in Crouch and Spiegelman (1990). Its effectiveness for the Poisson- 
lognormal remains to be studied. 


8. DISCUSSION 


Locally-weighted analysis and random effect models appear to provide a flexible means of 
dealing with a broad class of problems involving geographic data. The random effect terms 
have two important roles: handling omitted effects and borrowing strength for improved 
estimates of the principal parameters. For the Poisson alone, naive totals are efficient, yet there 
exists extra-Poisson variability due to omitted variables in the present case. 

The approach is computer intensive, because of the numerical integration and the maximum 
likelihood estimation at many points on a grid, but proved quite manageable on the Berkeley 
network of Sun 3/50’s. 

Much future work remains including: tools for assessing fit, uncertainty computation and 
display, weight function choice (particularly choice of 7 in (4)), analyses for other age groups 
and provinces, and appropriate asymptotics. Further understanding needs to be gained as to 
why with nearby initial values the optimizing routine sometimes converged to somewhat distant 
estimates. An advantage of the present circumstance is that there exists immense amounts of 
other data to be made use of as work progresses. Examination of Figures 6 on shows an 
important limitation of the technique - it is providing too much fine detail in the northern half 
of the province. 

Other recent papers devoted to the analysis of vital statistics rates are: Cressie and Read 
(1989), Clayton and Kaldor (1987), Tsutakawa (1988) and Manton et al. (1989). These papers 
are however not directed at the problem of obtaining a smooth surface, which is the concern 
of this work. 

It is amusing to note that the presence of the weekly period in the phenomenon allowed the 
author to deduce early on in the work that a confusion had arisen over which data set was to 
be supplied. When the days of fewest births were determined for the initial data set supplied, 
the days were found to be (apparently) Friday and Saturday. This was because the year 1987 
had been supplied, and not the desired 1986. 
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After the analyses were completed it was learned that the birth counts were based on 1981 
census divisions, while the population counts were based on 1986. Luckily the boundaries have 
not changed much, but this circumstance provides yet more reason for wanting a procedure 
that can handle extra-variation. 


9. ADDENDUM 


In the paper a case has been made for the inclusion of an error term, ¢, to reflect pertinent 
covariates that were unavailable for the analysis. This led to the employment of the Poisson- 
lognormal distribution. In Tukey (1990) an index of urbanicity of a census division is con- 
structed. It is based on the populations of the three largest places in the division. The values, 
x;, of the index are given in Figure 14 and are seen to be lowest in the census divisions con- 
taining Regina and Saskatoon. 

The table below gives the results of employing Glim to fit the successive Poisson models 
for B;, given N;: (i) Nexp{a + B; + ve}, (ii) Nyexp{a + Bj + yx + 4x;}, and (il) N; 
exp{a 35 B; tavieests 6X; aie ouale 


Figure 14. The values of the Tukey index of urbanicity. 
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By bringing in this urbanicity variable, x;, now a Poisson model is satisfactory for the 
circumstance. 

Finally the Referee made some comments that spell out quite specifically the assumptions 
and limitations of this present study. The work is continuing and the intention is to address 
these comments. Rather than paraphrasing, it seems more sensible to provide the referee’s own 
words. 

“‘The choice of weights is ad hoc and requires more thought. If one had two divisions, 
both of the same area but with vastly different populations N;, should the weighting be 
the same? It depends on whether area or population density is thought to be more impor- 
tant. Use of the latter may remove the spurious fine detail in the northern half of the 
province.’’ 

“There are traps with N;’s, which the author appears to be aware of, but I think the 
reader needs extra warning. It might help to have approximate measures of uncertainty 
({Section 1] promises none). Figure 3 cannot really be interpreted, since positive or negative 
values may be due to random fluctuations about zero. The contours in Figure 6 are calculated 
with vastly different precision, and in some respects are incomparable. And, [in Section 6], 
upon estimating a, 8 and y, it would be tempting (but unwise) to assume that such values 
are significant.’’ 

*‘All random variables in sight are assumed independent. Another way to motivate these 
weighted models is to assume a multivariate distribution, with the property that the conditional 
mean at (x,y), given the surrounding data, is a weighted combination of those data. Then the 
joint distribution exhibits dependence.”’ 
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APPENDIX I 


In this Appendix a few computing details are provided. The census divisions and the 
province boundaries are specified as polygons. To compute the weights w;(x,y) an algorithm 
was required to check whether a given point was inside a given polygon. To compute the mean 
and variance of a random point inside a given polygon, an algorithm for breaking a polygon 
up into triangles was required. Such algorithms are discussed in Preparata and Shamos (1985) 
for example. The approximate likelihood was maximized via the Harwell FORTRAN routine 
va09a. For the parallel computations the 40 by 40 grid was broken up into 20 disjoint segments 
and the computations thence carried out on 20 separate work stations. As in Brillinger and 
Preisler (1983), factors were introduced into the likelihood to stabilize the computations. 
Miyaoka (1989) found that the computations could be sensitive to the number of nodes 
employed. In the present series of computations, the number was increased until the results 
did not change much. There is also the problem of selecting inital values. Here they were taken 
to be the method of moment estimates, although these are perhaps too inefficient. 
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APPENDIX II 


For simplicity, consider the case of a point process {x;} with rate function v on the line. 
The local weighted log likelihood for a Poisson process is, up to a constant, 


»e W(x — x;) logv(x;) — |W (x — u)v(u)du. 
J 


So, the locally weighted estimate of the rate is 
V(x) = yy Wx || W(x — u)du, 
J 


the usual form of estimate. Suppose now the line is broken into intervals R;, and the aggregate 
count available is N(R;). One desires 


Ye ewes omy 


xj € Rj 


If this last is to be approximated by ON(R;), then the method of moments leads to 


@ = [ o ~ wydu/ | Ry 


Rj 


and thence to expression (3). 
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Benchmarking of Economic Time Series 


NORMAND LANIEL and KIMBERLEY FYFE! 


ABSTRACT 


Benchmarking is a method of improving estimates from a sub-annual survey with the help of correspon- 
ding estimates from an annual survey. For example, estimates of monthly retail sales might be improved 
using estimates from the annual survey. This article deals, first with the problem posed by the bench- 
marking of time series produced by economic surveys, and then reviews the most relevant methods for 
solving this problem. Next, two new statistical methods are proposed, based on a non-linear model for 
sub-annual data. The benchmarked estimates are then obtained by applying weighted least squares. 


KEY WORDS: Survey errors; Non-linear model; Weighted least squares. 


1. INTRODUCTION 


Traditionally benchmarking has been defined as the method of adjusting monthly or 
quarterly figures derived from one source to annual values (benchmarks) obtained via another 
source (see Denton 1971, Cholette 1988a, and Monsour and Trager 1979). For example, the 
monthly shipments of Canadian Manufacturers could be adjusted so that they add up to the 
Annual Census of Manufacturers shipments figures. Another definition of benchmarking is 
the more general one of improving sub-annual estimates derived from one source with annual 
estimates obtained via a second source (see Hillmer and Trabelsi 1987). This definition assumes 
that the annual values are subject to error, which is not the case with the first definition. For 
example, the monthly inventories of Canadian Retailers derived from a sample survey could 
be improved using the end of year inventories obtained from the annual retail trade sample 
survey. This second definition of benchmarking corresponds to the situation encountered with 
many economic time series and is the one dealt with in this paper. 

The purpose of this article is twofold, first it describes in detail, the benchmarking problem 
as it appears for many time series produced by large scale economic surveys. Then, two 
well known benchmarking methods dealing with a single time series are presented and 
discussed. Since both of these methods fail in some respects to resolve the problem, two 
other methods which use a non-linear weighted least squares approach are proposed. Finally, 
two of the above mentioned methods are illustrated with some simulated data and the results 
are discussed. 


2. PROBLEM DESCRIPTION 


The problem of improving a two-way table of sub-annual series of estimates with annual 
series of estimates from business surveys is described here, accompanied by the characteristics 
of the original data and a list of the features desired from a benchmarking procedure. 


! Normand Laniel and Kimberley Fyfe, Business Survey Methods Division, Statistics Canada, Ottawa, Ontario, 
KIA OT6. 
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The sub-annual estimates are often biased due to frame coverage deficiencies. Undercoverage 
is caused by delay in the inclusion of new businesses and no representation of non-employer 
businesses (usually small ones) on the frame. These sub-annual estimates are usually derived 
from relatively small overlapping samples, implying that sampling variances are relatively large 
and that sampling covariances exist between sub-annual estimates of different time periods. 
In addition, most economic sub-annual surveys produce series of estimates for a number of 
industrial activities within a number of geograghical regions. These are published sub-annually 
in the form of industry by geographical region tables, where the cells as well as the marginals 
and the grand totals need to be benchmarked. 

As regards annual estimates, they can be assumed to be unbiased since in practice their frames 
do not suffer much from coverage deficiencies. Also, the annual estimates usually come from 
relatively large samples or censuses and thus have relatively small or no sampling errors 
associated with them, while their sampling covariances tend to be large because of substantial 
sample overlap between years. Another point to note about the annual estimates is that these 
figures come in approximately two years after the time to which they refer. For example, annual 
data for 1988 will not be released until some time in 1990, while sub-annual data are usually 
available a few months after the time period to which they refer. Therefore, when the sub- 
annual estimates are to be benchmarked, there will be no annual benchmarks for some of the 
sub-annual periods. 

There are a number of features that a benchmarking procedure should have in order to be 
used for large scale survey estimates. First, the procedure should be simple enough that it can 
be used without too much data analysis. Second, it must be possible to produce preliminary 
benchmarking factors for periods for which benchmarks are not yet available. This feature 
allows benchmarking to be performed as the sub-annual data are produced. Otherwise discon- 
tinuities will be introduced in the sub-annual data. It is also desirable that the method maintain 
consistency between the grand-totals, marginal totals, and cell estimates for the benchmarked 
estimates in a table. 

More discussion on the last two features can be found in Laniel and Fyfe (1989) and (1990) 
and Cholette (1988a) and (1988b). The rest of this paper deals with the problem of bench- 
marking a single time series in the context described above. 


3. BENCHMARKING A SINGLE SERIES 


Four approaches to benchmarking a single time series of sub-annual flow or stock estimates 
are described in the following sub-sections. 


3.1 Denton’s Method 


In his 1971 paper, Denton proposed procedures for benchmarking based on a Quadratic 
Minimization approach, each of which corresponds to a specific penalty function. One of these 
penalty functions is the proportionate first difference between the original and benchmarked 
series and is often used for the problem of benchmarking time series that was described in 
section 2. Denton’s procedure can be presented in statistical terms by first stating that the 
sub-annual estimates follow the model: 


0, = Oy_-4 
—_- = -— TEA, [| Pa CRY (| (3.1) 
Jt A Sa 
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subject to the restriction to the annual data: 


Lia Denitiod old taisda: 2d? Uo saath (3.2) 
teT 
where: 
t refers to a sub-annual period, 
T refers to an annual period, 
{y,} is a sequence of biased estimates of the sub-annual parameters (levels), 
{0,} is a sequence of fixed sub-annual parameters (true values of the levels), 


{€,} is a sequence of uncorrelated and identically distributed errors with mean vector and 
covariance matrix (0,07/) and, 


{zr} 1s a sequence of annual benchmarks. 


To find the benchmarked estimates, least squares are applied to the above restricted model. 


It is important to note that Denton’s approach assumes that the bias follows a random walk 
and that both the sub-annual and annual data are observed without sampling errors. Unfor- 
tunately, these assumptions are unlikely to be satisfied by economic time series (see section 2). 


3.2 Hillmer and Trabelsi’s Method 


In 1987, Hillmer and Trabelsi proposed an alternative approach to benchmarking based 
on the Box-Jenkins (1976) ARIMA models. They assumed that the sub-annual estimates follow 
the model: 


Y= 6, + uy; MaMNILORw. KH (3.3) 


and the annual estimates follow the model: 


zp = )) + ar Teel oe eek ay (3.4) 
teT 


where: 


{6,} is a sequence of stochastic sub-annual parameters (true values of levels) following an 
ARIMA model, 

{y,} is a sequence of unbiased estimates of the sub-annual parameters, 

{u,} is asequence of sub-annual dependent sampling errors with mean vector and covariance 
matrix (0,Z,), 

{zr} is a sequence of annual unbiased estimates, and 

{ar} is a sequence of annual dependent sampling errors with mean vector and covariance 
matrix (0,2,). 


Using the above models, they obtain the benchmarked sub-annual estimates by applying 
stochastic least squares. That is, they minimize E (6, — 6,)*, the mean squared error. This 
technique is also referred to in time series terminology as signal extraction, and the derivation 
of the solution can be found in the paper written by Hillmer and Trabelsi. 

As it is stated with the models, this method takes into account the sampling variances and 
covariances of the sub-annual and annual estimates. Unfortunately, the approach does not 
accommodate biases in the sub-annual data. Also, since ARIMA modelling is being used in 
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this method, it would be costly to implement for large scale surveys dealing with hundreds of 
series. Therefore it would be best to use this type of approach for only a small number of very 
important economic indicators. There would also be risks of oversmoothing the data if the 
ARIMA models are not properly specified. 

Cholette and Dagum(1989) modified the Hillmer and Trabelsi approach by introducing an 
‘‘intervention’’ model instead of an ARIMA model. This allows the modelling of systematic 
effects in the time series, but according to the authors, this approach still possesses the same 
weaknesses as the original Hillmer and Trabelsi method. 


3.3 Model on Trends 


The following method was developed in an attempt to meet the benchmarking requirements 
of the economic surveys. It is based on the assumption that the sub-annual estimates follow 
the model: 


=— + ¥,; ean Zou a (St) 


and the annual estimates follow model (3.4), where: 


{0,} is a sequence of sub-annual parameters (true values), as in Denton’s method, 


{v,} 1s asequence of dependent sub-annual sampling errors of the trends with mean vector 
and covariance matrix (0,z,,). 


Least squares theory is applied to the above models to produce benchmarked estimates. The 
description of the Gauss-Newton algorithm necessary to solve this problem and the calcula- 
tion of the covariance matrix of the benchmarked estimates are given in Laniel and Fyfe 
(1989) or (1990). 


This method can be used when the benchmarks come from either a census or annual over- 
lapping samples and when the sub-annual level estimates are biased, if the relative bias is a 
constant. The assumption of a constant relative bias will be verified in practice if the rate of 
the frame maintenance activities is relatively stable, that is, when the proportion of frame 
coverage deficiencies is fairly constant over time. This assumption also implies that the under- 
covered businesses behave like the ones covered by the frame. 

One technical problem with this method is that the sampling variance-covariance matrix 
of the sub-annual trends cannot be calculated directly and an approximation has to be used. 
The first-order Taylor approximation has been tried but in some cases the resulting sampling 
variances and covariances were zero or negative when they should be positive. For this reason, 
an alternative model to (3.5) is presented in the next section. 


3.4 Model on Levels 


The following method is an alternative to the previous one and is suggested so that the 
sampling variance-covariance matrix of the sub-annual estimates would be easier to obtain. 
It assumes that the sub-annual estimates follows the model: 


y= 00,+u, t= 1,2,...,2, (3.6) 


where a is a fixed parameter taking into account the constant relative bias and u, is the same 
as for equation (3.3). The annual estimates follow model (3.4). 
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Benchmarked estimates are obtained by applying least squares theory to the above models. 
The algorithm required to solve this problem is the same as for method 3.3. 


3.5 Discussion 


Among the methods reviewed here, the most appropriate one for benchmarking a single 
time series in the context of the large scale surveys is the new approach based on the model 
on levels. It has a statistical basis which allows us to calculate confidence regions and test the 
goodness of fit of the benchmarked model. To test for lack of fit one has to be careful in 
choosing a test since the benchmarked estimates, 6, , have quite a small number of degrees of 
freedom, m — 1 (the number of annual observations minus one), in comparison to the number 
of observations, n + m. This small number of degrees of freedom also suggests that with the 
model on levels, we can expect to get benchmarked estimates with a chronological pattern 
similar to the one observed in the sub-annual data. 

A current practical issue with benchmarking methods which take into account sampling 
errors such as in 3.4, is the derivation of sampling covariances between two level estimates 
corresponding to two different time periods. Should they be calculated directly using the 
sample design for all pairs of time periods or should they be modelled? From a theoretical 
point of view, it is better to calculate these directly, since the sequence of sampling errors is 
intrinsically a non-stationary stochastic process due to the population variance-covariance 
varying with time. However, calculating all sampling covariances can be cumbersome, thus 
leaving the issue of how to obtain sampling covariances still an open question. 


3.6 An Example 


As a comparison between Denton’s method described in section 3.1 and the model on the 
levels approach suggested in section 3.4, these two methods were applied to a special and 
interesting benchmarking case. It is a situation where the annual estimates have sampling 
variances six times the size of the sampling variances of the corresponding monthly estimates. 
In such a case, the advantage of using the model on levels approach instead of Denton’s method 
will be easily observed. 

The special case, though possible in practice, was made up of simulated data. Firstly, twenty- 
four monthly estimates were taken from an existing economic survey. A sampling covariance 
matrix was arbitrarily given to these monthly estimates. The variances and covariances were 
calculated in by using an equal coefficient of variation through time and the following cor- 
relation pattern: 


_li-il 


for 1 =K1} 2; pet 24 Yond ff = Ts2, 24 24 
24 


pig = 1 


where / and / are the indexes of a pair of monthly estimates. Then, two corresponding annual 
estimates were constructed as follows. The first annual figure was 25% larger than the sum 
of the first monthly figures. Whereas the second annual figure was only 5% larger than the 
total of the last twelve monthly observations. The two annual estimates were given sampling 
variances equal to six times the variances of the corresponding monthly totals and their 
correlation was fixed at 0.5. 

The monthly estimates are represented by the full continuous line and the annual estimates 
by the horizontal lines on figure 3.1. The two horizontal lines are equal to the values of the 
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annual figures divided by twelve. On the same figure, the line with long dots represents the 
monthly series benchmarked with the approach based on the model on levels. The line with 
short dots is the benchmarked monthly series with Denton’s method. 

From figure 3.1, it can be observed that the series benchmarked with the model on levels 
approach has the same year-to-year movement as the original monthly series. Whereas the series 
benchmarked with Denton’s method has the same year-to-year movement as the annual 
estimates. It can also be seen that both benchmarked series are over the original monthly series. 

The difference in the year-to-year movement between the two benchmarked series can be 
explained as follows. The approach based on the model on levels gives the benchmarked series 
a year-to-year movement essentially obtained by weighting the annual and sub-annual data 
with the inverse of their sampling variances. Since, in this example, the sub-annual estimates 
are much more reliable than the annual estimates, the benchmarked series got the year-to-year 
movement of the monthly figures. Whereas with Denton’s method, the year-to-year movement 
of the benchmarked series is constrained to one of the annual series regardless of its reliability. 
In this sense the approach based on the model on levels is better than Denton’s method. 

As a last comment on this example, the fact that both benchmarked series are above the 
original monthly series simply illustrates that both methods are providing a correction for the 
bias of the monthly estimates. 
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Figure 3.1 Plot of the original and two benchmarked monthly series and of the annual series 
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4. CONCLUSION 


The problem of improving sub-annual survey estimates with the use of annual survey 
estimates has been examined. A new and simple procedure to benchmark a single time series 
has been presented. This procedure could be implemented in a computer system to allow its 
use in an automated mode. The advantage of the procedure over more traditional methods 
(e.g., Denton’s) is that it takes account of sampling errors. Some issues in using the proposed 
procedure for benchmarking a single time series have been discussed. Two important prac- 
tical questions have been pointed out: benchmarking a table of series and preliminary bench- 
marking. Approaches to address these two topics have to be explored. 
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Forgot the Sampling Scheme at the Estimation Stage? 


SHIBDAS BANDYOPADHYAY! 


ABSTRACT 


For a class of linear unbiased estimators in a class of sampling schemes, it is shown that one can forget 
the weights used for sample selection while estimating a population ratio by a ratio of two unbiased 
estimators, respectively of the numerator and the denominator defining the population ratio. This class 
of schemes includes commonly used sampling schemes such as unequal probability sampling with or 
without replacement, stratified proportional allocation sampling with unequal selection probabilities and 
without replacement in each stratum, etc. 


KEY WORDS: Ratio of unweighted totals; Symmetric sampling. 


1. INTRODUCTION 


Let m be the number of adult literates among ¢ adult members in a sample of n families 
drawn from a given population. Let the population adult literacy rate R be estimated as 
r = m/t. Similarly, for atwo- way table giving percentage distribution of persons by age-group 
and sex, let a cell entry be estimated by a ratio (multiplied by 100 to make it a percentage) of 
the number of persons classified into the cell to the total number of persons, in the sample 
of n families. 

Irrespective of the method of selection of the families, this simple ratio of two unweighted 
totals for estimating a ratio or a percentage distribution is acceptable to many non-statistical 
users. Indeed, in some survey reports, tables giving percentage distributions or rates are so com- 
puted, as if the sampling scheme had been a self-weighting one. 

If, however, the sampling scheme for selecting the n families had been a (single stage) 
PPSWOR, one is expected to go about finding weighted totals for obtaining unbiased estimators 
of numerators and respective denominators before computing a ratio or a percentage distribution. 

This study shows that, for sampling schemes such as a single stage PPSWOR but without 
any further assumptions, 


(i) aratio of two unweighted totals estimates the corresponding population ratio, as a ratio 
of an unbiased estimator of the numerator to an unbiased estimator of the respective 
denominator; 

(ii) there is a class of sampling schemes, other than self-weighting designs, for which (i) holds. 
This class includes one stage unequal probability, with or without replacement, sampling 
schemes and stratified proportional allocation sampling with unequal probability without 
replacement selection in each stratum. 


2. SYMMETRIC SAMPLING SCHEMES 


Consider a finite population consisting of N units U;, Uz, ..., Un. Let Y; and X;, denote the 
values of two study variables, Y and X respectively, associated with the unit U;, i = 1,2, ...,N. 


! Shibdas Bandyopadhyay; Applied Statistics, Surveys and Computing Division, Indian Statistical Institute, 203 
Barrackpore Trunk Road, Calcutta 700 035, India. 
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The problem is to estimate a rate or aratio R = T(Y)/T(X) where T(Y) = Y, + Y> 
+... + Yn, and T(X) is similarly defined with the variable X. 


The usual procedure is to estimate T( Y) and T(X) unbiasedly and take their ratio to estimate 
R. The aim of this paper is to follow the same procedure in such a way that the ratio becomes 
free of the selection probabilities of the sample units. 


Fix a sampling scheme. 


Let S denote the set consisting of all possible samples such that p(s) > 0, where p(s) denotes 
the probability of drawing the sample s, and ).,5p(s) = 1. 


Fors nS and f= 1122 IN, 


n (i,s) = the number of times U; is included in s, and a; = ¥,.5n(i,s), the number of times 
U; is included in all possible samples. 


S, p(s), a; depend on the sampling scheme. 


Definition 2.1. A sampling scheme is said to be symmetric if a; = a, foralli = 1,2, ..., N. 


The following estimator, based on the sample s, in the class of linear unbiased estimators 
of Godambe (1955) for T( Y), was studied by Bandyopadhyay et al. (1977). 


N 
PCY ,s) =" yoy nts)a,; p(s). (2.1) 


i=1 


Clearly, T( Y,s) is unbiased for T( Y). An estimator of the ratio R = T(Y)/T(X), asa 
ratio of an unbiased estimator of T( Y) to an unbiased estimator of T(X), based on a sample 
S, is 


N N 
R(s) = T(Y¥,s)/T(X,s) = Dy Y;n(i,s) ai] i X;n(1,5) a; °. (2.2) 


i=1 i=1 


For symmetric sampling schemes, a; = a for all i and (2.2) becomes 


N n 
R(s) = ie vinci) 3 X;n(i,s) = 


i=] i=] 


unweighted total of Y values in the sample (2.3) 
unweighted total of X values in the sample 


and the above observations are summarized in the following theorem. 
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Main theorem. For asymmetric sampling scheme, a ratio of two unweighted totals estimates 
the corresponding population ratio as a ratio of an unbiased estimator of the numerator to 
an unbiased estimator of the respective denominator, but the estimated ratio does not involve 
the selection probabilities of the population units in the sample. 


It may be noted that the inclusion probabilities of the units in the sample need not be equal 
for symmetric sampling schemes. Thus, symmetric sampling schemes need not be self-weighting. 
Self-weighting designs require constancy of a;p(s) for all i and s, and constancy of a; p(S) 
for all ¢ and s does not make the sampling scheme symmetric. 

For a non-symmetric scheme, (2.2) is easy to compute as a,’s are easy to compute in most 
cases and there is no need to compute inclusion probabilities. 

For without replacement sampling of n units, there are (y = 1) (un-ordered) samples 
containing a given unit U;, so a; = (7/-{) for all i and thus, in particular, PPSWOR is 
symmetric. It may be noted that not all PPSWOR schemes result in ) possible samples. As 
noted in Connor (1966), in some cases systematic PPS samples in a pre-determined order or 
randomized PPS systematic sampling may result in zero probability for some set of n units. 
The result applies if the PPSWOR scheme is such that no joint inclusion probability of any 
set of n units is zero. 

For with replacement sampling of units, there are N” (ordered) samples and so 
a; = nN"~! for all i and thus, in particular, PPSWR is symmetric. 

For PPSWOR in each of k strata, the a-value for each unit in the jth stratum is 


nj K N; 
Natta Nj I] (*') 


J j=1 


which becomes a constant when allocation is proportional and if no joint probability of any 
set of units in any stratum is zero, where N; and n; are respectively the population and sample 
sizes for the jth stratum, j = 1, 2, ..., k. Similar allocation may be made to make a multistage 
sampling scheme symmetric. 

For PPSWR sampling, it may be noted that the unbiased estimator of T( Y) given by (2.1) is 
inadmissible. This estimator can be improved upon by putting n*(i,s) and a;* respectively for 
n(i,s) and a;, where n* (i,s) is 1 if n(i,s) is at least 1 and n*(i,s) is zero if n(i,s) is zero, and 
af is w defined with n* (i,s). Here, a* = N” — (N — 1)”, the number of (ordered) samples 
containing a given unit U;. It has not been possible to obtain a mathematical expression for 
relative efficiency in a closed form for comparison, even with respect to PPSWR schemes. 
study is included for comparison with PPSWOR scheme. Another attractive possibility is to 
study large sample variance and bias using Taylor series expansions. 

It is clear that it is not possible to estimate the variance of R(s) without the weights or 
further assumptions. However, if s, and s, are two half-samples drawn by the same symmetric 
sampling scheme (like two independent PPSWOR samples of equal size), R is estimated as 
[R(s,) + R(s,)]/2, and its unbiased variance estimator is [R(s,;) — R(s,)]7/4. 

If T(X) is known, a ratio-type estimator for T( Y) is T(X) T( Y,s)/T(X,s), which may be 
improved as in Bandyopadhyay (1980) depending on whether or not the sampling fraction is 
more than half. 

When the population units are divided into k non-overlapping clusters and the selection pro- 
bability of the jth cluster is p; then the design become symmetric with a; = 1 for all units in 
all the clusters. It may be noted that the sample size is the size of the selected cluster and so, 
the symmetric sampling schemes need not be fixed sample size designs. 
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3. EMPIRICAL STUDY ON BIAS AND MEAN SQUARE ERROR 


Yates and Grundy (1953) considered the following three hypothetical populations, each with 
4 population units. 


Population A Population B Population C 
Xx. Ot 0.2- 0.3. 0:4 O-L 02570/3500.4 0.1, 3.0.2.50-3. 0.4 


YaweO:Soslt2is 2alind.2 0.82) 1.4, 345875:2.0 0.2/7,0:6::0.9;h, 0.8 


The sampling scheme is to draw a sample of sizen = 2 by PPSWOR using _X-values as size 
measure. It is proposed to compare bias and mean square error of R(s) with those of Ri‘). 
where R{*) is the ratio of the Horvitz-Thompson (1952) estimator of T( Y) to that of T(X). 
The result of the comparison is presented below. 


Populations: A B G 
Relative bias of R(s) 0.02456 -—0.02785 -—0.00496 
Relative bias of Ryr(s) — 0.00379 0.00552 0.00232 
MSE of R(s) 0.2946 0.2946 0.0824 
MSE of Ryr(s) 0.3159 0.3642 0.0690 


Relative efficiency of R(s) to Ry7(s) 1.0723 P2302 0.8374 


Though the absolute bias of R(s) relative to R is more than that of R{¥). for the three 
populations, differences are small. R(s) is a more efficient estimator in populations A and 
Band Ryz(s) is more efficient in population C. 

Since the above three populations are more extreme than the situations usually met with 
in practice, it is anticipated that R(s) may be useful when the sampling scheme is not available 
at the estimation stage. 
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Estimation of Panel Correlations for the 
Canadian Labour Force Survey 


HYUNSHIK LEE! 


ABSTRACT 


The Canadian Labour Force Survey uses the rotation panel design. Every month, one sixth of the sample 
rotates and five sixths remain. Hence, under this rotation scheme, once a rotation panel enters in the 
sample, it stays 6 months in the sample before it rotates out. Because of this design feature and the way 
of selecting the rotate-in panel, the estimates based on the panels in the same or different months are 
correlated. The correlation between two panel estimates is called the panel correlation. Three kinds of 
panel correlations are defined in this paper: (1) the correlation (denoted by p) between estimates for 
the same characteristic based on the same panel in different months; (2) the correlation (denoted by +) 
between estimates of the same characteristic based on geographically neighboring panels in different 
months; (3) the correlation (denoted by 7) between estimates of different characteristics based on the 
same panel in the same or different months. This paper describes a methodology for estimating these 
panel correlations and presents estimated correlations for selected variables using 1980-81 and 1985-87 
data with some discussion. 


KEY WORDS: Repeated panel survey; Rotation; Taylor method. 


1. INTRODUCTION 


The Labour Force Survey (LFS) is a continuing monthly household survey which employs 
rotating panel design. The sample consists of six equal size rotation panels one of which is 
replaced by a new panel each month. The rotated-in panel stays in the sample for six months 
before it rotates out from the sample. (For detailed description of the LFS methodology, readers 
are referred to Platek and Singh (1976) and Singh et a/. (1990).) Therefore, the estimates based 
on the same panel consisting of the same sampling units in different months are highly cor- 
related. Moreover, an outgoing rotation panel is usually replaced by a neighboring panel. 
Because they are geographically close, estimates based on these neighboring rotation panels 
are also correlated. These correlations are called panel correlations. In this paper, we will 
describe and discuss how the panel correlations can be estimated and present their estimates 
for selected variables. The work was originated for the study of composite estimation technique. 
However, the results are applicable in any situation where the panel correlation plays a role. 

The paper is structured as follows. In Section 2, necessary definitions, notations and assump- 
tions are given. Methodology is described in Section 3 and results and discussion are given in 
Section 4. 


2. DEFINITIONS OF PANEL CORRELATION COEFFICIENTS 


To define various panel correlations we need to define common panels and the predecessor 
panel. A panel is identified by the panel number which indicates the duration of the panel in the 
sample. Thus, Panel 1 in month m, becomes Panel 2inmonthm + 1, Panel3inmonthm + 2, 
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Table 1 
Common and Predecessor Panels Pertaining to Months m and m — j 


m-1l1 m-—-2 m-3 m-4 m-5 m-6 m-7 m-8 m-9 m-10 m-I11 


(6) (5) (4) (3) (2) (1) ((6)) ((5)) ((4)) ((3)) ((2)) 

(6) (5) (4) (3) (2) (1) ((6)) ((5)) ((4)) ((3)) 
1 (6) (5) (4) (3) (2) (1) ((6)) ((5)) ((4)) 
2 (6) (5) (4) (3) (2) (1) ((6)) ((5)) 
3 2 1 (6) (5) (4) (3) (2) (1) ((6)) 
4 3 2 1 (6) (5) (4) (3) (2) (1) 


Nn UU fh WY NO — 


Note: Single and double parentheses indicate single and double predecessors, respectively. 


and so on. Another term rotation group is often used to identify a panel regardless of its dura- 
tion in the sample. For instance, Rotation Group 1 which rotates in in January is identified 
as Rotation Group | throughout its stay in the sample until it rotates out in July. Then, Panel 
1 in January indicates Rotation Group 1 and Panel 2 in February indicates the same rotation 
group which is now two months old and so on. 

Two panels in two different months which represent the same rotation group are called 
common panels. When a rotation group rotates out, it is usually replaced by a rotation group 
consisting of neighboring households and given the same rotation group number. A panel 
associated with the out-going rotation group is called a predecessor panel of a panel associated 
with the in-coming rotation group. Therefore, in the above example, Panel 6 in June which 
is associated with Rotation Group 1 is a predecessor panel of Panel 1 in July. Table 1 shows 
schematically the common and predecessor panels pertaining to given months mandm — /. 

Since each panel can be identified by two components, month and panel number, let 
P(month, panel number) denote a panel. Then P(m, 4) and P(m — 1, 3), for instance, are 
common panels | month apart. Similarly, P(m, 4) and P(m — 2, 2) are common panels 2 
month apart. The correlation coefficient of estimates of a characteristic based on common 
panels that are j months apart is denoted by p; . Obviously, there are no common panels which 
are more than 5 months apart and thus, the subscript j can be at most 5. We assume that p; 
is independent of m and panel number. However, it is a function of j and varies between 
characteristics. 

The correlation coefficient of estimates based on a panel and its predecessor that are / months 
apart is denoted by y;. But in this case, j can go up to 11, i.e. 71; is the last correlation coeffi- 
cient in this series and it is the correlation between P(m, 6) and P(m — 11, 1). We assume 
again that y’s are independent of m and panel number. They do, however, depend on 
characteristic as well as j as p-correlations do. 

The third type of panel correlation is defined as the correlation between estimates for two 
different characteristics based on common panels and denoted by 7; for common panels that 
are j months apart. Now j can take values from 0 to 5. The same assumptions as for the p’s 
and y’s apply here as well. 


The formal definitions of p’s, y’s and 7’s are as follows: 


Let y,,,, be the LFS estimate of a characteristic of interest obtained from P(m,/). We assume 
that V(y_7) = oy regardless of m and /. Then, p;’s are defined by 


COV Valo Yims flops = p;95, aah james ta eres Cee 
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and y;’s by 
COV (Ym, 15 Ym j,641-;) = Yj 95» 
Wictolwes sil bs j-=-Gandys— 5 = = 61f 7 <j = 11; 


It would be natural to conjecture that p;’s and y;’s decrease as the subscript / increases and 
that p;’s are larger than y;’s because p,’s are correlations pertaining common households while 
y,;’S are those pertaining neighboring households. We can also define the correlation between 
a panel and the predecessor of the panel’s predecessor (denoted by double parentheses and 
called double predecessor in Table 1) in a similar way, say 6, and thus, we have 6,, ds, ..., 
617. They will be smaller than y;’s but could be quite close to them for the same subscript 
because double and single predecessors are close geographically. However, the 6-correlations 
are not considered here due to time and resource constraints. 

We assume that Cov( Ym, Ym) = Oif! A Ll’ and Cov(Vm1,¥m—j,1') = if P(m — j,l’) 
is not a common panel nor a predecessor of P(m,/). 

In order to define 7-correlations, let x,,; be the LFS estimate of another characteristic 
obtained from P(m,/) and let V(x,,,;) = o2 be independent of m and /. Then 7-correlations 
are defined by 


COV nin) = 1 0x0y, Ua) a5, J <1 = 6, 


3. ESTIMATION OF THE PANEL CORRELATIONS 


Since a variance estimation computer program was available, the method described here 
was geared to use this program with minimum modification. The methodology used in the 
program is the generalized Keyfitz method (Choudhry and Lee 1987; Lee 1989a) better known 
as the Taylor method. The program can compute variance estimates of linear combinations 
of monthly estimates. 

We employ the following basic equality to estimate the desired correlations using the existing 
variance program: 


V(A)i+ V(B)-— (A & B) 


Cov(A,B) = 5 


(1) 


From the program, V(A — B),V(A) and V(B) can be obtained and so can Cov(A,B) using 
(1). An expression for V(A — B) from which (1) can be obtained is also given in Kish (1965). 


3.1 Estimation of p-Correlations 


Let A= ¥f.. Ym, and B= Y oe Ym-1,1- A and B are obtained by eliminating Panel 1 
from month m and Panel 6 from month m — 1, respectively. Note that the eliminated panels 
are uncommon and the remaining ones are all common. Using the variance program we 
compute estimates of V(A — B), V(A) and V(B) and obtain estimates of Cov(A,B) by (1). 
From the assumptions given in Section 2, it is easy to see that 


Cov(A,B) = 59,07, 


Ale inc 4 G2) We aely ses 
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and thus, 
ied Cov(A,B) (2) 
VV(A)V(B) 

An estimate of p, is then obtained by substituting estimates of Cov(A,B),V(A) and V(B). 
Estimates of py, 03 and p, can be obtained in the same way by putting A = ) te j+1m,1, and 
B= Y/¥m—j,1.J = 2, 3, 4. But there is some problem in estimating ps this way. When we 
drop all uncommon panels from months m and m — 5, only one panel is left in each month 
and this causes problem in variance estimation for Self-Representing Units (SRUs). SRUs are 
large cities each of which is represented in the survey by independent sampling. There is no 
such problem for Non-Self-Representing Units (NSRUs) which are the areas outside of the 
SRUs, containing rural areas and small urban centers. In NSRUs, each Primary Sampling Unit 
(PSU), which becomes a replicate for variance estimation, has all rotation panels and thus, 
even after eliminating 5 uncommon panels, there is still one panel remaining in the PSU so 
that variance can be computed. In SRUs, however, rotation panels form replicates and if there 
is only one panel left, then there is only one replicate in each stratum and thus, variance can 
not be computed in the usual way. Therefore, 6; was obtained by prediction using a nonlinear 
regressionp = a + bt + ce~',t = 1, ..., 4. Another way to estimate p; will be discussed 
later in Subsection 4.1. 


3.2 Estimation of y-Correlations 


It is easy to see that Cov(A,B) = (5p; + yi) o7if A = Lfiyymand B= YPLim-1)- 
In general, 


Cov(A,B) = {(6 — j)o; + /yj}o?, 


where 


6 
| Sdn) sce ald label) Pcie 
Then, an estimate of 7; can be obtained from the following equation: 


1 Cov(A,B) : 
ne = (6 — Hey], (3) 
nc | VV(A)V(B) . 

by substituting estimated values on the right. There is a direct way to estimate these 


y-correlations including y; by 


Cov (A;,B;) 


"> TVA) VB)” (4) 


where Aj = Y/=1¥m,,and B) = Yf-7_j¥m—j,1, J = 2, ..., 5. In Section 4, the two methods 
were compared by using empirical data. 
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Other y-correlations (y;, 7 = 6, ..., 10) are obtained by (4) with 


A; a De Ym,l> 


There is no simple way of estimating 7, directly or indirectly. Both 7; and +,, were 
predicted by a log-linear model y = exp(a + bt), t = 1, ..., 4,6, ..., 10. 


3.3. Estimation of 7-Correlations 


These correlations can be estimated by the same way as the p-correlations just by replacing 
Ym,t BY Xm,1- Let A = Yfij4iX%m,,and B = Lfym_—j;1,j = 0,1, ..., 4. Then we have 


Cov(A,B) = (6 — J) 70,0, 
V(A) = (6 — f)og, 
V(B) = (6 — j)o;, 
from which we get 


_ Cov(A,B) se 
1 = RUSTE Stites: (5) 


All 7’s can be estimated using (5) except 7; which is predicted by a log-linear model, 
m=texp (at bi) t select 


4. RESULTS AND DISCUSSION 


By using the methods discussed in the previous section, estimates of p- and y-correlations 
were computed from the 1980-81 and 1985-87 LFS data for 5 characteristics: In Labour Force 
(IN LF), Employed (EMP), Employed Agriculture (EMP AG), Employed Non-Agriculture 
(EMP NON-AG), Unemployed (UNEMP). The panel correlations were estimated for only 3 
provinces, Nova Scotia (NS), Ontario (ONT), and British Columbia (BC) from the 1980-81 
data. However, the estimation was extended to all provinces when more recent data (March 
1985 - February 1987) were used. Moreover, 4 more characteristics, the employed and the 
unemployed of two age groups, 15-24 and 25+ (EMP 15-24, EMP 25+, UNEMP 15-24, 
UNEMP 25 +), were added. The estimation of 7-correlations was done only for those addi- 
tional characteristics for NS, ONT and Alberta (ALT) from the 1985-87 data. 

In the following, only part of these results will be presented and discussed. All the results 
are available in Lee (1989b). 
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4.1 Estimates of p-Correlations 


The results of estimated p-correlations are given in Table 2. Even though estimates for the 
5 characteristics (IN LF, EMP, EMP AG, EMP NON-AG, UNEMP) from the 1985-87 data 
are available for all provinces, the results for only 3 provinces, NS, ONT and BC, are presented 
for a historical comparison. Table 2 also shows the results for the other 4 characteristics (EMP 
15-24, EMP 25+, UNEMP 15-24, UNEMP 25+) from the provinces of NS and ONT. 

The p-correlations are generally high as expected because they are correlations for the common 
panels. The correlations for EMP AG are the highest and those for UNEMP are the lowest. 
It seems that the size of the p-correlation indicates the degree of mobility of the labour force 
with a particular characteristic. For instance, the high p-correlation for EMP AG shows a low 
mobility of the labour force in agriculture while a high mobility of unemployed labour force 
is demonstrated in its low p-correlation. The different levels of mobility of labour force in two 
age groups are also evident. The younger group (15-24) is more mobile than the older one (25 + ). 

The decreasing trend of the p-correlations over time is clearly demonstrated in the results. 
The trend was extremely well fitted by a nonlinear regression model p, = a + bt + ce~'. The 
R-squares (multiple correlations) are close to 1 (> 0.98). Therefore, the predicted values for 
ps seem to be very good. In Lee (1989a and 1989b), 6; was obtained by extrapolating 63; and 
4 instead. The differences between the predicted and extrapolated values for 6;, however, are 
very small. They are less than 0.01 for all characteristics except for UNEMP, UNEMP 15-24 
and UNEMP 25+ where the largest difference is 0.03. 


Table 2 
Estimates of p-Correlations (1980-81 and 1985-87 Data) 


80-81 Data 85-87 Data 
Prov Characteristic 
A} b2 b3 b4 ps py b2 b3 pa ps 
NS IN LF 0.862 0.797 0.744 0.679 0.622 0.845 0.769 0.730 0.696 0.670 
EMP 0.866 0.783 0.714 0.651 0.590 0.863 0.768 0.713 0.686 0.660 
EMP AG 0.913 0.837 0.756 0.678 0.598 0.912 0.867 0.825 0.802 0.773 
EMP NON-AG _ 0.865 0.774 0.710 0.649 0.594 0.873 0.779 0.724 «20.697 0.670 
UNEMP 0.590 0.455 0.333 0.243 0.145 0.703 0.546 0.426 0.415 0.375 
EMP 15-24 0.773 0.632 0.556 0.495 0.446 
EMP 25+ 0.878 0.800 0.754 0.729 0.705 
UNEMP 15-24 0.618 0.454 0.364 0.300 0.246 
UNEMP 25 + 0.695 0.554 0.443 0.440 0.406 
ONT IN LF 0.843 0.782 0.717 0.674 0.622 0.846 # 0.781 0.732 ~=0.681 0.635 
EMP 0.852 0.779 0.709 0.664 0.611 0.853 0.771 0.706 0.648 0.592 
EMP AG 0.955 0.926 0.901 0.861 0.827 0.962 0.948 0.944 0.937 0.934 
EMP NON-AG _ 0.861 0.791 0.724 0.678 0.625 0.866 0.795 0.746 0.701 0.660 
UNEMP 0.580 0.445 0.334 0.286 0.222 0.579 0.436 0.328 0.291 0.238 
EMP 15-24 0.747 0.605 0.500 0.429 0.356 
EMP 25+ 0/883 1 <01824: 90087 77 F910 73200.691 
UNEMP 15-24 05468142 :0:339te= 0.257 56.0. 219 SF 01178 
UNEMP 25 + 05622), 0:468550:565. 03135) 705256 
BG, -cIN‘EP 0.849 0.767 0.705 0.665 OF6225-— 0.817 OF S530 OO! 0.647 0.597 
EMP 0.835 0.755 0.695 0.651 0.607 0.851 OSIM O74 ti 0.651 0.597 
EMP AG 0.896 0.809 0.733 0.656 0.582 0.938 0.886 0.847 0.828 0.805 


EMP NON-AG_ 0.855 —s(0.769—Ss «0.715. Ss: 0.661 =: 0.616 =—-:0.857'- Ss «0.784 = («0.730 Ss 0.679 =: 0.632 
UNEMP 0.516 0.407 0.334 0.320 0.294 0.634 0.524 0.459 0.363 0.290 
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4.2 Estimates of y-Correlations 


As mentioned in Subsection 3.2, there are two ways of estimating 7», y; and 4, that is, by 
formulae (3) and (4). We will call the method by (3) as Method 1 and that by (4) as Method 2. 
Only Method 1 can be used to estimate y, while direct estimation of y; is feasible only by 
Method 2. The two methods are compared in Table 3 using empirical data. In the table, 7;’s 
for Method | are predicted values by a log-linear model. The table shows that the two methods 
produced somewhat different results. The correlations produced by Method 2 clearly show 
an increasing trend contrary to our intuition while Method 1 gave more acceptable results. 
Moreover, if we compare these correlations with 7, in Table 4A (which had to be estimated 
by Method 1), Method 1 seems to produce more reasonable results than Method 2. Therefore, 
we adopted Method 1. However, if everything is correct, the two methods should be equivalent 
and produce similar results. It seems that the real data do not conform to some extent with 
the assumptions we made to derive the formulae. 

Estimates of the y-correlations are presented in Tables 4A and 4B. The size of y-correlations 
is much smaller than that of p-correlations as we expected. But it also reflects differences 
in mobility of the labour force with different characteristics as seen from the results of 
p-correlations. 

The overall trend of 7’s is somewhat fuzzy, especially for the results from the 1985-87 data. 
There are about 25% of cases - a case is a row entry in the tables - in Table 4B which show 
an increasing trend. In those cases, the log-linear regression lines have a positive slope even 
though it is fairly small in magnitude. Moreover, in most of those cases, R-squares are small, 
which indicates that fittings by the log-linear model are not good. This does not mean, however, 
that there are other models which can fit the data better. Rather it means that no clear trend 
is exhibited. Among the cases that show a decreasing trend, about half of the cases have an 
R-square greater than 0.5. 

The results from the 1980-81 data show a quite different picture. There is only one case that 
shows an increasing trend and most of the cases have R-squares > 0.5. In fact, the results for 
NS and BC look more reasonable than those for ONT as far as the trend is concerned. 


Table 3 


Comparison of Estimates of y2, 73, y4 and ys Obtained by Different Methods 
(Ontario, 1980-81) 


Characteristic Method 42 13 V4 V5 
IN LF 1 0.141 0.128 0.133 0.135 
2 0.107 0.105 0.116 0.120 
EMP 1 0.136 0.142 0.142 0.147 
2 0.100 0.115 0.126 0.133 
EMP AG 1 0.483 0.474 0.486 0.451 
0.321 0.370 0.407 0.448 
EMP NON-AG 1 0.150 0.147 0.157 0.163 
0.117 0.134 0.145 0.149 
UNEMP 1 0.074 0.076 0.063 0.080 
2 0.043 0.056 0.046 0.043 


Note: Methods 1 and 2 are defined by the formulae (3) and (4) in Section 3, respectively. 
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Table 4A 


Estimates of y-Correlations 
(1980-81 Data) 


Prov Characteristic V1 42 13 V4 V5 6 V7 ¥8 V9 710 V1 

NS INLF 0.288 0.263 0.265 0.250 0.236 0.233 0.211 0.199 0.193 0.167 0.164 
EMP 0.262 0.219 0.228 0.226 0.219 0.239 0.210 0.200 0.188 0.161 0.172 
EMP AG 0.351 0.308 0.283 0.237 0.205 0.190 0.141 0.113 0.063 0.021 0.007 
EMP NON-AG 0.238 0.187 0.189 0.180 0.164 0.151 0.123 0.121 0.136 0.091 0.086 
UNEMP 0.106 0.176 0.091 0.097 0.091 0.076 0.066 0.063 0.066 0.032 0.031 

ONT IN LF 0.161 0.141 0.128 0.133 0.135 0.136 0.125 0.127 0.124 0.122 0.117 
EMP 0.164 0.136 0.142 0.142 0.147 0.149 0.148 0.150 0.153 0.141 0.146 
EMP AG 0.477 0.483 0.474 0.486 0.451 0.474 0.459 0.429 0.394 0.323 0.368 
EMP NON-AG 0.184 0.150 0.147 0.157 0.163 0.167 0.166 0.169 0.174 0.156 0.165 
UNEMP 0.141 0.074 0.076 0.063 0.080 0.051 0.045 0.060 0.077 0.136 0.074 

BC INLF ON 7am O13 7 Ose ORL 19. (O-L1I9 SS ON1L2e0: 101s 0.0126 0L094 0.066 0.070 
EMP 0.211 0.146 0.133 0.107 0.101 0.083 0.050 0.068 0.058 -—0.033 —0.015 
EMP AG 0.380 0.311 0.301 0.272 0.241 0.216 0.198 0.170 0.122 0.078 0.071 
EMP NON-AG 0.207 0.166 0.161 0.129 0.108 0.093 0.069 0.038 0.023 -—0.004 -—0.020 
UNEMP 0.126 0.125 0.114 0.103 0.091 0.076 0.062 0.092 0.032 0.040 0.031 

Table 4B 


Estimates of y-Correlations 
(1985-87 Data) 


Prov Characteristic V1 ¥2 73 V4 ¥5 V6 V7 ¥8 V9 710 V1 

NS INLF 0.250 0.238 0.247 0.230 0.216 0.204 0.181 0.196 0.189 0.162 0.160 
EMP 0.170 0.183 0.205 0.196 0.185 0.157 0.158 0.194 0.198 0.219 0.198 
EMP AG 0.326 0.296 0.246 0.245 0.265 0.267 0.234 0.217 0.259 0.269 0.231 
EMP NON-AG 0.146 0.168 0.199 0.201 0.178 0.153 0.152 0.189 0.199 0.216 0.201 
UNEMP 0.233 0.267 0.241 0.211 0.206 0.168 0.171 0.176 0.157 0.187 0.147 
EMP 15-24 0.107 0.127 0.140 0.133 0.112 0.105 0.099 0.107 0.090 0.074 0.082 
EMP 25+ 0.088 0.075 0.117 0.108 0.100 0.099 0.090 0.103 0.099 0.137 0.118 


UNEMP 15-24 0.051 0.080 0.042 0.024 0.054 0.061 0.079 0.081 0.058 0.011 0.049 
UNEMP 25+ 0.155 05129 0N77 O71 0.1485 081597 0.158) 02127) 0102 0.134 0.124 


ONT IN LF 0.162 OL138 BO 14105134 50.1325 02135 FON27 OG O2bit 0.103 0.101 
EMP 0.114 OAZZ OFZ 0122 BOT O24 (019 O08 010 0.112 0.111 
EMP AG 0.508 0.518 0.553 0.561 0.571 0.569 0.582 0.617 0.668 0.650 0.672 
EMP NON-AG 0.133 0.140 0.132 0.140 0.157 0.156 0.168 0.182 0.204 0.205 0.210 
UNEMP 0.030 0.047 0.055 0.047 0.043 0.048 0.039 0.030 0.039 0.048 0.041 
EMP 15-24 0.012 —0.006 0.018 0.031 0.017 0.023 0.011 0.011 0.016 0.044 0.029 
EMP 25+ 0.354 0.358 0.349 0.343 0.319 0.312 0.298 0.285 0.276 0.240 0.246 


UNEMP 15-24 0.068 0.039 0.038 0.058 0.033 0.026 0.008 0.018 0.011 -—0.002 —-—0.006 
UNEMP 25+ = 0.052 0.054 0.033 0.017 0.034 0.033 0.026 0.018 0.021 0.044 0.022 


Be IN LE 0.103 0.095 0.113 0.103 0.090 0.090 0.091 0.083 0.078 0.030 0.055 
EMP 0.125 O00 O12 OF SO MIG Ss O135 502123 SOI 1s 0.095 0.114 
EMP AG 0.394 0.443 0.426 0.401 0.396 0.400 0.401 0.381 0.347 0.334 0.345 


EMP NON-AG 0.080 0.067 0.076 0.072 0.091 0.109 O.111 0.118 0.112 0.106 0.124 
UNEMP 0.096 0.086 0.084 0.080 0.083 0.097 0.068 0.074 0.068 0.083 0.071 
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Table 5 
Estimates of 7-Correlations 
X,: EMP 15-24, x2: EMP 25+, x3: UNEMP 15-24, x4: UNEMP 25+, 
(1985-87 Data) 


Province Characteristic 70 7) 72 73 74 75 

NS (x1,X2) 0.150 0.140 0.148 0.181 0.187 0.196 
(x1,*3) —0.440 —-—0.275 — 0.187 — 0.135 — 0.039 0.126 
(x1,X4) — 0.036 —0.040 —-—0.043 — 0.015 0.024 0.022 
(x2,X3) — 0.029 — 0.037 — 0.078 —0.049 —0.016 — 0.038 
(X2,%4) — 0.437 — 0.374 — 0.276 — 0.182 — 0.231 — 0.094 
(3,4) 0.136 0.127 0.094 0.055 0.049 0.020 

ONT (1,2) 0.092 0.070 0.055 0.040 0.028 0.010 
(1,3) — 0.420 — 0.267 — 0.205 —0.161 — 0.145 — 0.010 
(1,4) — 0.065 — 0.056 = 0:053 — 0.036 — 0.028 — 0.019 
(x2,X3) — 0.061 — 0.054 — 0.054 — 0.042 —0.089  —0.074 
(x2,X4) 0.392 — 0.303 =0:230 7 pa 0.187 —0.181 — 0.077 
(3,%4) 0.058 0.043 0.022 0.013 0.022 0.001 


4.3 Estimates of 7-Correlations 


Table 5 contains estimates of 7-correlations obtained from the 1985-87 data for all possible 
combinations of EMP 15-24 (denoted by x,), EMP 25+ (x,), UNEMP 15-24 (x3) and 
UNEMP 25+ (x4). The correlations between x, and x, are positive as well as those between 
x3 and x,. Other correlations are mostly negative. In terms of magnitude, only the correlations 
pertaining to (x,, x;) and (x, x4) are quite different from zero. Others are close to zero. 
These observations seem to agree with what we understand about the movement of labour force 
between the employed and the unemployed in the same age group. When the employment 
increases, the unemployment decreases and vice versa. The trend is obviously upward in these 
cases. 

The data were fit by a log-linear model and 7,’s were predicted. The model fitting seems 
reasonable except for the correlations between (x2, x;) whose R-squares are very small in both 
provinces NS and ONT. 


4.4 Conclusions 


The estimation of correlations from complex survey data is a difficult problem. It is so 
not because the derivation of formulae is difficult - in fact, the formulae given here are 
elementary — but because there are many practical constraints in applying the formulae. If we 
had not made the assumptions in Section 3, the estimation of the panel correlations by using 
the existing computer program would have been impossible. On the other hand, these assump- 
tions should be conformable to the real data to which the formulae are applied. In our case, 
there seem to be some unconformable elements in the assumptions we made to the real data, 
which was indicated by the discrepancy in the results obtained by formulae (3) and (4) (see 
Table 3). Nevertheless, the estimates are not thought to be unreasonable. 

In a study of the composite estimator for the LFS, the results given in this paper were 
successfully used to compare various composite estimators (Kumar and Lee 1983). Recently 
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Binder and Dick (1990) proposed a method for analyzing Seasonal ARIMA models by taking 
the survey errors into account. They applied their technique to the LFS data using the estimated 
panel correlations. However, in cases when the results to be obtained by the use of the estimated 
panel correlations are sensitive to the accuracy of these estimates, the results should be inter- 
preted carefully. 
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First Wave Effects in the U.S. Consumer 
Expenditure Interview Survey 


ADRIANA R. SILBERSTEIN! 


ABSTRACT 


Panel responses to the U.S. Consumer Expenditure Interview Survey are compared, to assess the magnitude 
of telescoping in the unbounded first wave. Analysis of selected expense categories confirms other studies’ 
findings that telescoping can be considerable in unbounded interviews and tends to vary by type of expense. 
In addition, estimates from the first wave are found to be greater than estimates derived from subse- 
quent waves, even after telescoping effects are deducted, and much of these effects can be attributed 
to the shorter recall period in the first wave of this survey. 


KEY WORDS: Bounding; Telescoping; Recall Bias; Conditioning. 


1. INTRODUCTION 


Respondents to retrospective surveys are asked to recall details of events within a specific 
time interval, or reference period, and this task of identifying the correct time in which events 
occurred may be as difficult as remembering the events. Misdating, or ‘‘telescoping’’, is widely 
recognized as a source of error in surveys, although it is rarely studied directly (Neter and 
Waksberg 1965). Respondents tend to include in the report events that occurred outside the 
reference period (external telescoping), e.g., when events are recalled as more recent than they 
actually are (forward telescoping). Data that can be validated with independent records show 
that both forward and backward misdating errors are made by respondents (Mathiowetz 1985). 
This could be ‘‘due to the respondent’s wish to perform the task required.... When in doubt, 
the respondent prefers to give too much information rather than too little’? (Sudman and 
Bradburn 1974, p. 69). The net effect of telescoping is generally forward. Bounding methods 
are designed to create boundaries around the reference period of the survey report, and, in 
so doing, avoid misdating errors by respondents. A method for bounding the starting point 
of the reference period, best applied during the interview, involves comparing events reported 
in a prior interview and deleting duplicate reports. Extending the reference period up to the 
interview day is a method commonly used to bound the end of the reference period. 
‘“‘Unbounded”’ reports result by necessity from one-time surveys, and for questions asked only 
once or for the first time in panel surveys, since no prior data exist to check for erroneous 
inclusions. These effects can be reduced by including ‘‘anchoring’’ techniques during the 
interview, e.g. constructing a time line (Mingay 1987, p. 132). 

This paper is concerned with reporting levels experienced by first time respondents of panel 
surveys, and provides a comparative analysis of first and subsequent interview waves. The 
study investigates potential telescoping, conditioning, and recall length effects in estimates of 
household expenditures, based on data reported in the U.S. Consumer Expenditure (CE) 
Interview Survey for the year 1984. This survey is one of two independent components designed 
to collect national data on household expenditures, the other component being the Diary Survey. 


! Adriana R. Silberstein is a Mathematical Statistician, Office of Prices and Living Conditions, Statistical Methods 
Division, U.S. Bureau of Labor Statistics, Washington, D.C. 20212, USA. 
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The survey is conducted by the Census Bureau under contract to the Bureau of Labor Statistics. 
The first wave of the CE Interview Survey is used to establish cooperation, collect initial inven- 
tory data on household possessions, and bound the second wave. There are four subsequent 
waves of interviews three months apart, collecting data for the previous three calendar months 
up to the interview day. The bounding method is as follows. Expenses reported for the por- 
tion of the calendar month in which the interview takes place (or ‘‘current month’’) are later 
transcribed onto the next wave questionnaire; this information is available to the interviewer 
to check for duplicate reports, but is not read to respondents. Data collected during the first 
wave pertain to expenditures for the current month and for one previous calendar month; these 
latter expenditures are excluded entirely from the estimates, while current month expenditures 
become part of the second wave. More details on collection and estimation methods can be 
found in the 1984 Bulletin (U.S. Bureau of Labor Statistics 1986), and are discussed by 
Silberstein and Jacobs (1989). 

The findings underscore the need for bounding methods in retrospective data collection, 
since sizable telescoping effects may be present in unbounded recall. In addition, the analysis 
points out that first time responses may yield higher estimates even after telescoping effects 
are deducted. These first wave effects may be a direct result of the shorter recall in this wave 
of the CE Interview Survey, although other factors are not excluded. A discussion of the analysis 
used to identify telescoping effects is included in section 2, and estimates of telescoping and 
first wave effects are included in section 3. Conclusions can be found in section 4. 


2. IDENTIFYING TELESCOPING EFFECTS 


2.1 Method of Analysis 


One approach for identifying telescoping errors, discussed by Kalton et a/. (1989, p. 257), 
is to examine whether there are duplicates in individual responses to consecutive waves. This 
micro-level approach is not necessarily accurate, as the respondent for a given household may 
change from one wave to the next. The method is also impractical, since independent records, 
needed to reconcile discrepancies on dates, may not be readily available. Duplicate responses 
may not be recorded as such in an ongoing survey, even when they are identified during the 
interview, as in the CE Interview Survey. More commonly, telescoping effects are evaluated 
at the aggregate level, by comparing estimates of unbounded and bounded responses, with 
certain precautions. Tracking the experience of several panels is advisable in order to over- 
come seasonal incomparabilities, since bounded responses are reported subsequently to 
unbounded responses and, therefore, do not refer to the same time interval. Another factor 
to account for in the comparisons is panel conditioning, a phenomenon that refers to changes 
in respondent behavior as a result of being part of a panel, or to changes in the quality of 
reports. The assumptions made and the method of estimation used in this study are discussed 
in section 3, whereas the preliminary testing procedure is described here. 

The first step in the analysis is to ascertain whether symptoms of external telescoping can 
be detected from the survey data. A level of reporting in the first wave that is higher than 
expected is an indication of telescoping. Unbounded interviews are known to yield higher 
estimates than bounded interviews, as documented in several studies that compared unbounded 
and bounded responses (Neter and Waksberg 1964 and 1965; Murphy and Cowan 1976; 
Cantor 1985). Another indication is the presence of differential effects across separate types 
of the collected data. Major sources of differences in the way events are retrieved and stated 
by respondents are recall bias and telescoping. The relationship of these factors suggests that 
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smaller expenses are forgotten as time increases, but larger more salient expenses, that tend 
to be remembered better, are more often telescoped. 

Telescoping errors can also occur in bounded responses, causing the forward shifting of 
data within the reference period (internal telescoping). While overall estimates do not change 
as a result of these effects, the distribution for the three recall months is affected. Reports of 
apparel and home furnishing and equipment expenses were selected for the study, because 
characteristics of these expenses were helpful in the analysis. These commodities include expen- 
ditures of various degree of salience, and were grouped accordingly. They also tend to differ 
by degree of underreporting. Many apparel estimates are 40% below the estimates from the 
National Accounts (NA), and several estimates for home furnishings and equipment are also 
lower than NA estimates. Estimates for furniture and selected equipment categories, on the 
other hand, are only 7% below the independent estimates (Gieseman 1987, p. 11), and higher 
reports in the first wave can be interpreted as the result of external telescoping. 

The hypothesis evaluated is whether the first recall month of bounded waves, i.e., the month 
prior to the interview, is reported similarly to the past month in the first wave. The Hotelling 
T’ was used to test differences in eight expenditure groups within each of the two commod- 
ities. Given two vectors of means in a repeated-measures design, a two-tailed .05-level test of 
Ho: Cy = 0 (equality of means) versus Hj: Cu # 0 was applied. Hp was rejected if: 


[(C£¥)’(C SC’)! C¥]/Inp/(m — (p — 1))] > Fon—p+il.05), (1) 


where x is a vector of sample means within each commodity (ordered as shown in the tables), 
S is the covariance matrix computed with the method of balanced repeated replication 
(n = 20 replicates), C is the contrast matrix shown below, and p is the number of contrasts 
in C. 


—S 
oo 


Simultaneous confidence intervals for individual comparisons by group were derived using 
the Bonferroni method (Johnson and Wichern 1988), with percentile ¢,,(.05/2p). Expenditure 
means were computed using a log transformation of individual expenses reported in the first 
recall month. Sample weights included adjustments for nonresponse and subsampling, but 
excluded final weight factors for population controls, which were not available for the first 
wave. Note that weight adjustments for the first wave were computed only as part of this 
research, since they are not needed in the ongoing estimation process. 

Data from waves 2 to 5 were combined, since differences between these waves were very 
small. Responses by participants in all five waves (3200 respondents) were selected to assure 
comparability between the waves and bounding of waves 2 to 5. Unbounded interviews are 
experienced by new panel respondents, e.g. new occupants at a sample address, and by 
respondents who do not participate in one or more wave during the panel. In 1984, 89% of 
the interviews in waves 2 to 5 were bounded, 8% were unbounded because respondents were 
new to the panel and 3% were unbounded resulting from a previous refusal or other non- 
cooperation (Silberstein 1988). Estimates are affected by unbounded responses, as pointed out 
by Biderman and Cantor (1984), but this aspect is not treated directly in this study. 
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2.2 Test Results 


Comparisons between means are shown in Table 1 in the original scale, i.e., without the 
log transformation used in the statistical tests. The first wave displays higher means in nearly 
all expense groups, and the overall test is significant. The tests for the individual groups 
reveal that significant differences are found only for large expenditures, such as coats and 
jackets in apparel and appliances and furniture in home furnishings and equipment. The 
groups with significant differences are more represented in wave | than in other waves, not 
surprisingly: they account for 19% of total apparel and 72% of total home furnishings in 
the first wave, compared to 16% and 67%, respectively, in the first recall month of other 
waves, as shown in Table 2 (columns | and 2). A greater number of expenses are also reported 
in wave | for these groups of expenses (Table 2, columns 3 and 4). In addition, the average 
dollar value of reported expenses in wave 1 tends to be different from the other waves for 
big-ticket items (e.g., major appliances), but very similar for smaller items (Table 2, columns 
5 and 6). 


Table 1 
Percent Difference in Expenditure Means 


Wave 1 
Versus First Recall Month of 
Waves 2 to 5 
% Difference Ss 
(a) 
APPAREL: (b) 14.5* 4.9 
Coats, jackets, furs, suits 39.6* 12.9 
Trousers, slacks, jeans 13.6 9.5 
Shirts, blouses, tops 9.7 5.6 
Sweaters, dresses, skirts 16.4 4.7 
Undergarments, hosiery 6.9 5.4 
Miscellaneous and combined clothing —2.5 fi 
Footwear piel | 6.1 
Other apparel items and services 27.4 ZO.4 
Overall test value: 4.16* 

HOME FURNISHINGS AND EQUIPMENT: (b) 48 .6* 8.4 
Major appliances Toth Be 7H 
Other appliances eit he 17.0 
Furniture 115 YEW ig 24.8 
Large household and entertainment equipment 34.2* 16.0 
Other household and entertainment equipment FOr Uri 
Home furnishing repair and services 7.0 14.6 
Dishes, decorative items, linens 14.0 16.0 
Floor and window coverings 5225 24.3 


Overall test value: 13.86* 


(a) Positive values indicate first wave mean is greater. Base of percentages is mean of first recall month in waves 2 to 5. 
(b) Commodity totals not included in overall test. 

S Standard error of percent difference. 

* Significant (a = .05). 
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Table 2 
Comparisons of First Wave and First Recall Month of Subsequent Waves 
Percent Percent Average 
of of Total Dollar 
Total Number of Value of 
Expenses Expenses Expenses 


Wave Waves Wave Waves Wave Waves 
1 2 to 5 1 2 tow 1 ZO 


(1) (2) (3) (4) (5) (6) 


APPAREL: 100.0 100.0 100.0 100.0 $ 35 $ 33 
Coats, jackets, furs, suits 19.2 a) 9.3 8.6 71 59 
Trousers, slacks, jeans 10.7 10.8 10.6 9.8 36 35 
Shirts, blouses, tops 10.0 10.4 12.0 12-2 31 29 
Sweaters, dresses, skirts 14.3 14.0 13.0 12.4 38 37 
Undergarments, hosiery Si 5.6 16.8 16.7 11 11 
Miscellaneous and combined clothing eye) 18.2 15.4 16.4 36 38 
Footwear fie7 13.1 12.8 13.6 33 31 
Other items and services 13.5 12.2 10.1 10.4 45 40 


HOME FURNISHINGS AND 


EQUIPMENT: 100.0 100.0 100.0 100.0 $123 $ 92 
Major appliances 11.4 9.6 4.2 3.4 370 277 
Other appliances 28 232: 9.2 viral 29 30 
Furniture 28.3 19.9 8.9 a 385 251 
Large household and entertainment 

equipment 19.7 21.8 8.8 7.6 262 266 
Other household and entertainment 

equipment 10.7 13.4 Zed 22.8 58 56 
Home furnishing repair and services 4.7 6.6 8.4 9:5 67 65 
Dishes, decorative items, linens 12.9 16.8 33.1 S725 46 39 


Floor and window coverings 10.0 9.8 4.6 4.5 294 172 


These differences can be interpreted in several ways, e.g., they may indicate that more expen- 
sive purchases are reported in the first wave, or that purchases reported in the first wave are 
remembered as more expensive. Another interpretation is that a period of time longer than 
a month may be covered by respondents when the recall is unbounded, especially for large, 
easily remembered, expenses. In Table 3, comparisons by wave are extended to include the 
three recall months of subsequent waves. The findings are consistent with the previous tests, 
but tend to narrow in on the issue of telescoping effects. These comparisons are made on the 
basis of reporting rates according to the dollar value of the expense. The reporting rate is defined 
as the percentage of respondents reporting one or more expense of a given type. Note that indi- 
vidual expenses are generally entered on the questionnaire, with the exception of expenses for 
the same item, month and person in the family, which are usually reported as combined totals 
and counted as one ‘‘expense’’. 
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Table 3 
Monthly Reporting Rates by Expense Size 
Waves 2 to 5 
Maven by Recall Month 
First Second Third 
Percent of respondents 
(1) (2) (3) (4) 

APPAREL: 

No Apparel Expenses (a) 28.8 29.3 38.2 45.5 
Less than $10 38.4 EM Oe! 27.9 25.4 
$ 10 to $ 40 57.9 Sone 45.3 41.0 
$ 40 to $100 aoe 31.0 26.5 21.0 
$100 and over 0s lau TES 8.8 

Wave | vs Ist recall month of waves 2 to 5 
Overall test value: 29.1* 

HOME FURNISHINGS AND 

EQUIPMENT: 

No Home Furnishing Expenses (a) 48.1* 51.2 58.5 62.4 
Less than $10 1223 | Was ew) thee) 
$ 10 to $ 40 30.9 30.0 25.0 eek 
$ 40 to $100 213% 18.4 14.9 12.8 
$100 to $400 18278 13.8 a1 10.3 
$400 and over 8.6* 5.6 ay! 4.6 


Wave | vs Ist recall month of waves 2 to 5 
Overall test value: 17.0* 


(a) Category included in overall test. 
* Significant (a = .05) 


Consistent with the previous comparisons, the overall test is significant and the individual 
comparisons show that significantly more respondents report expenses of $100 or more in the 
first wave; reporting rates for smaller expenses are not significantly different, instead. When 
the three recall months are examined, the reporting rates for the first recall month appear to 
be closer to the first wave than to the other two months. The three recall months in waves 2 
to 5 show a familiar pattern of decreased reporting, and noteworthy is the increase in the percent 
of respondents reporting ‘‘no expenses’’. This pattern is evident in each panel wave, as 
documented by Silberstein and Jacobs (1989) and further studied by Silberstein (1989), and 
is more likely due to recall effects than telescoping. When reporting rates are recomputed to 
include only respondents that report the commodity, it is found there are more similarities 
among the three recall months in subsequent waves than with the first wave. (The rates can 
be derived from Table 3, by using the percentage of reporters with expenses as the base.) These 
reporting rates for home furnishing items of $100 and over are 53% in the first wave and 40%, 
41%, and 40%, respectively, in the three recall months of other waves. For apparel items of 
$100 and over the rates are 24% in the first wave and 19%, 19%, and 16%, respectively, in 
the three recall months of other waves. These differences are believed to be symptomatic of 
external telescoping in the unbounded recall. 
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3. ESTIMATING TELESCOPING AND FIRST WAVE EFFECTS 


3.1 Telescoping Effects 


The hypothesis of equality of means implied the response task in the first wave is similar 
to the one experienced for the first recall month in subsequent waves. The data did not sup- 
port the hypothesis, since differential effects were found, suggesting external telescoping in 
the first wave. The results tend to agree with the notion, forwarded by Loftus (1986, p. 196), 
that internal telescoping may ‘‘arise from a different cognitive mechanism’’ than external 
telescoping. A general definition of external telescoping (8), on a monthly basis and assuming 
no panel conditioning, is given by the ratio of unbounded one month recall (with sample mean 
Xy) and bounded one month recall (with sample mean X,): 


B = (Exy/EXz) — 1. (2) 


This expression may be an overstatement since conditioning effects contribute to lower values 
for the bounded mean. Panel responses commonly display a downward trend, due to decreased 
reporting with increasing time-in-sample (TIS) (Bailar 1989). Conditioning effects («) between 
two consecutive waves can be defined by the ratio of the two responses (with sample means 
x, and X;,): 


a= 1 — (EX;,,/EX;). (3) 


A number of assumptions were made to develop telescoping estimates from the survey data. 
Expenditure means of bounded one month recall, needed for comparisons with the first wave, 
cannot be obtained directly from the three month recall. Monthly means computed by dividing 
the bounded three month recall by a factor of three are not acceptable, considering the recall 
loss evident in the third recall month of the CE Interview Survey. As an alternative, the first 
and second recall months were used to estimate bounded monthly means, assuming that recall 
bias in the second month is moderate and telescoping into the first recall month is mostly from 
the second recall month. The estimating method is an adaptation of the model developed by 
Neter and Waksberg in analyzing the 1960 experimental study of expenditures for Residential 
Alterations and Repairs (Neter and Waksberg 1964 and 1965). The model implies that tele- 
scoping and conditioning effects are multiplicative and conditioning compounds with time-in- 
sample. Since conditioning effects are derived from relationships observed between second and 
third waves, two terms are necessary when estimating (2) under the assumption of conditioning. 
An estimate of telescoping is therefore: 


De= (Xu /Xz)(L —- a) (1 =, af2) — 1, (4) 


The derivation of (4) is given in the appendix. The conditioning rate (a) was assumed to 
be constant between waves, considering the special subset of respondents in all five waves. 
(The Neter/Waksberg model assumed greater effects between the first and second wave.) 
Time-in-sample effects appear to be small in the CE Interview Survey, judging from a study 
that compared responses in waves 2 to 5 (Silberstein and Jacobs 1989). An explanation for this 
may be that declines in reporting are offset by improvements in reporting, as respondents 
become more knowledgeable about the reporting process. Two conditioning assumptions 
provided two estimates of telescoping effects, using (4): a = 0 (no conditioning), anda > 0 
conditioning, equal to the rate observed between second and third waves. Four apparel groups 
and three home furnishing and equipment groups showed some decline from second to third 
waves, displayed as positive proportions in column 5 of Table 4. These ratios, while not 
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Table 4 
Telescoping Estimates Based on Expenses 
Telescoping effects b, TIS effects 
Far 0 Ifa>o0 a 
% S % S 
(1) (2) (3) (4) (5) 

APPAREL: 28.4 7.0 - - — 0.02 
Coats, jackets, furs, suits 46.2 14.2 - - — 0.01 
Trousers, slacks, jeans 30.3 8.6 12.3 11.8 0.10 
Shirts, blouses, tops pe 7.8 17.6 16.7 0.05 
Sweaters, dresses, skirts 28.3 5.9 8.7 15.0 Oak 
Undergarments, hosiery 22uz 6.9 daz 12.7 0.08 
Miscellaneous and combined clothing See 9.5 - - —0.18 
Footwear 18.1 oie | - - — 0.08 
Other items and services 54.9 35.8 - - —0.15 
HOME FURNISHINGS AND 

EQUIPMENT: 63.1 8.9 - - — 0.04 
Major appliances 95.4 30.7 - - — 0.03 
Other appliances 76.4 16.1 36.0 19.7 0.16 
Furniture 13-3 2 - - —0.05 
Large household and entertainment 

equipment 38.7 13.1 36.5 S3er 0.01 
Other household and entertainment 

equipment 26.2 8.9 - - —0.11 
Home furnishing repair and services 15.6 14.5 - - — 0.29 
Dishes, decorative items, linens 45.4 14.4 ~ - — 0.06 
Floor and window coverings 89.4 38.0 66.8 68.7 0.08 


a Time-in-sample (TIS), or conditioning, effects when positive. 
S Standard error of percent difference. 


significant (.05 level), were applied as the conditioning loss between the first and the second 
wave. Net increases in reports were not considered realistic for the unknown conditioning 
between these two waves. 

The results give indications of the increase that would occur in the estimates in the absence 
of bounding. Table 4 shows estimates of telescoping effects in percentage form, excluding 
conditioning effects (column 1), and including them (column 3). Telescoping levels of 40% 
or higher are estimated for ‘‘Coats, efc.’’ and ‘‘Other items and services’’ (a group that includes 
watches and jewelry), but much lower levels are estimated for other apparel groups. High 
telescoping levels (63%, on average) are estimated for home furnishing and equipment expenses. 
Telescoping estimates decrease considerably when some conditioning effects are taken into 
account, and would be even lower if greater conditioning effects were assumed between wave 
1 and wave 2. While these estimates are affected by sampling variability and the assumptions 
made, the results are consistent with findings reported in other surveys. Neter and Waksberg 
(1965) reported average telescoping effects of 55% with no conditioning losses and 39% with 
conditioning losses, for home improvement expenditures; telescoping effects were much lower 
for small jobs. Telescoping effects derived from the 1974/75 Crime Survey indicated telescoping 
effects of 44% for personal victimization incidents and 40% for property victimization 
(Murphy ef al. 1976). 
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3.2 First Wave Effects 


Differences in responses between first and subsequent waves reflect many cognitive aspects 
of panel interviews. This section discusses some of the factors involved, and includes a 
preliminary investigation of net effects. Provided that respondents participate in the whole 
panel, there is a progressive relationship between respondent and interviewer and more clear 
expectations on both sides. Quite a few interview conditions change, however. While in some 
panel surveys subsequent waves may be presented as follow-ups to the first wave, in the CE 
Interview Survey respondents are asked to report for a period of time three times as long after 
the first wave and detailed income information is asked in waves 2 and 5. This greater reporting 
load, and a resulting faster interview pace, has a negative impact on reporting levels, even for 
the first recall month of these waves. More expense records, e.g., check books and bills, may 
be used in these waves compared to the first wave, making the bounded reports less likely to 
be affected by telescoping within the three recall months. The first wave is an easier interview, 
especially with regard to categories of expenses sensitive to the length of the reference period 
and the number of persons in the household, e.g. apparel expenses. The relative importance 
of these factors should be researched in field and laboratory studies. 

Separate estimates of first wave means, net of telescoping, were developed using the two 
sets of telescoping effects shown in Table 4. These means (%g;) were derived by dividing the 
unbounded means by the telescoping estimates: 


Xp, = Xy/(1 + be). (5) 


Results are summarized by commodity in Table 5. Both estimates of net first wave means are 
higher than means of waves 2 to 5 for all recall months combined, shown in column 2. The 
total apparel mean is 10% higher in the first wave when conditioning effects are not included, 
and 16% higher when they are included. The home furnishing and equipment means are also 
higher, but at a smaller scale: 3% without conditioning and 5% with conditioning. These 
estimated effects, remaining after telescoping, are interpreted as resulting from the shorter recall 
period and lesser reporting load in the first wave. The differences between the two commodities 
and the results for specific groups of expenditures imply that potential gains in reporting tend 
to increase for smaller expenses, but become quite marginal for big-ticket items. 


Table 5 


Summary Comparisons of FirstWave and Subsequent Waves 
Annual Expenditure Means (Standard errors) 


Waves Waves Mh 1 Net of 
2.to 5 210'5 elescoping 


Wave | All Recall First Assuming Assuming 


Months recall 

no TIS TIS 
(a) Month Effects Effects 

(1) (2) (3) (4) (5) 
APPAREL $1,663 $1,182 $1,452 $1,295 $1,370 
(59.6) (61.7) (71.0) (66.2) (n.a.) 
HOME FURNISHINGS AND $1,972 $1,179 $1,327 $1,209 $1,235 
EQUIPMENT (85.0) (59.7) (73:1) (61.5) (n.a.) 


(a) Means differ from published 1984 estimates, due to special subset of respondents and missing final weight factors. 


302 Silberstein: Consumer Expenditure Interview Survey 


4. CONCLUSIONS 


This paper provides an investigation of potential telescoping effects in unbounded inter- 
views. These effects appear to be considerable, especially for more salient or prominent events. 
Results from the U.S. Consumer Expenditure Interview Survey indicate that estimates of large 
infrequent expenses, based on unbounded one month recall, may be between 30% and 50% 
overstated. Lower overstatement levels are more likely in estimates of small frequent expenses. 
These findings are in close agreement with other studies on the subject. The study demonstrates 
that external telescoping effects are much greater than internal telescoping effects within a three 
month recall period of subsequent waves. In addition, the first wave of the panel survey studied 
was found to exhibit higher means than the overall means for subsequent waves, even after 
estimated telescoping effects were deducted. Since the first wave in this survey has one month 
recall, it is concluded that considerable improvements in reporting levels can be expected from 
a shorter recall. The potential gains are estimated to be at least 10% for frequent expenditures, 
but would become marginal as the value of the expenditure increases. 

Although the one month recall is viewed as the major reason for the higher estimates, 
other factors are not excluded. Conditioning effects, assumed constant in this study, may 
vary between waves. Estimates of one month recall would be even greater, if higher condi- 
tioning effects were assumed between the first and second waves. Cognitive aspects of the 
interview, e.g., respondents cooperation and involvement, and interviewers’ approach to 
collecting data, should be researched in order to understand panel conditioning. The issue of 
differential effects by type of expenditure should also be addressed within this context. Field 
and laboratory studies of these data collection aspects would have implications for improving 
panel survey methodology. 
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APPENDIX 


(1) Explanation of Selected Expenditure Groups 


SELECTED APPAREL 


Miscellaneous and combined clothing: nightwear, loungewear, accessories, uniforms, 
and clothing items for infants under 2. 


Other apparel items and services: watches, jewelry, sewing materials for making clothes, 
repair and alteration services, and clothing rental or storage. 

SELECTED HOME FURNISHINGS AND EQUIPMENT 
Other appliances: small electric kitchen and personal care appliances. 


Large household and entertainment equipment: lawn mowers, window air conditioners, 
televisions, sound equipment, and bicycles. 


Other household and entertainment equipment: radios, tape recorders, tools, calculators, 
camping or sports equipment, and infants equipment. 


(2) Estimates of Telescoping Effects 
(Adapted from: Neter and Waksberg (1965), 33-37). 


For each expenditure group 


Let: X, = unbounded one month recall sample mean; 


Xz = bounded one month recall sample mean, not directly observed in the CE Interview 
Survey; 


X2,X; = one-month-average sample means from waves 2 and 3, respectively, computed using 
first and second recall months. 


Define: Telescoping effect 8, assuming no conditioning 


B = (EXy/EX,) — 1. (1) 
Conditioning effect, a, between two consecutive waves 
a=1- (EX; 4 ;/EX;) . (2) 


Then, assuming telescoping compounds on conditioning, 

Dor Exp exp) Cl — a) —~t (3) 
is the telescoping effect under conditioning. 
Using the estimated conditioning effect between 2nd and 3rd waves, a = 1 — (%3/X2), the 
estimated mean for bounded one month recall is: 

Xp = (X, + X3)/2 

(%. + %(1 — a))/2 
X,(1 — a/2). (4) 


Assuming a constant rate of conditioning and using (3) and (4), an estimate of the telescoping 
effect under conditioning, Dc, is: 


be = (Xy/Xp) C1 = a) (1 = a/2) — 1. (5) 
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Symmetry in Flows Among Reported Victimization 
Classifications with Nonresponse 


ELIZABETH A. STASNY! 


ABSTRACT 


The United States’ National Crime Survey is a large-scale, household survey used to provide estimates 
of victimizations. The National Crime Survey uses a rotating panel design under which sampled housing 
units are maintained in the sample for three-and-one-half years with residents of the housing units being 
interviewed every six months. Nonresponse is a serious problem in longitudinal data from the National 
Crime Survey since as few as 25% of all individuals interviewed for the survey are respondents over an 
entire three-and-one-half-year period. In addition, the nonresponse typically does not occur at random 
with respect to victimization status. This paper presents models for gross flows among two types of 
victimization reporting classifications: number of victimizations and seriousness of victimization. The 
models allow for random or nonrandom nonresponse mechanisms, and allow the probabilities underlying 
the gross flows to be either unconstrained or symmetric. The models are fit, using maximum likelihood 
estimation, to the data from the National Crime Survey. 


KEY WORDS: Categorical data; Ignorable nonresponse; Longitudinal survey; National Crime Survey; 
Nonignorable nonresponse. 


1. INTRODUCTION 


The United States’ National Crime Survey (NCS) is a large-scale, household survey con- 
ducted by the U.S. Bureau of the Census for the Bureau of Justice Statistics. Data from the 
NCS is used to produce quarterly estimates of victimization rates and yearly estimates of the 
prevalence of crime. The survey uses a rotating panel of housing units (HU’s) under which 
individuals living in sampled HU’s are interviewed up to seven times at six-month intervals. 

Individuals interviewed for the NCS are asked about crimes committed against them or 
against their property in the previous six months. In this work, we begin to explore the vic- 
timization status reported by households (HH’s) within sampled HU’s from one interview to 
the next. Victimization status for a HH will be considered in two ways: by the number of crimes 
reported (zero, one, and two or more) and by the type of crime reported (no crime, property 
crime, and personal contact crime). 

Since responses are not available from one NCS interview period to the next for all HH’s, 
we must decide how to handle missing observations. The nonresponse problem is a serious 
problem in the longitudinal data available from the NCS. For example, Fienberg (1980) noted 
that complete, three-and-one-half-year records of NCS interviews are available for as few as 
25% of all individuals interviewed. In addition, the nonresponse typically does not occur at 
random with respect to victimization status (see, for example, Saphire (1984)). 

This work extends the models developed by Stasny (1986) for nonrandom nonresponse in 
estimating gross flows. In particular, the models presented here allow for symmetry in the matrix 
of flows among victimization classifications as well as allowing for completely random 
nonresponse, ignorable nonrandom nonresponse, or nonignorable nonresponse. 


! Blizabeth A. Stasny, Department of Statistics, The Ohio State University, 1958 Neil Avenue, Columbus, Ohio 
43210, USA. 
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Section 2 of this paper provides a brief description of the NCS and the longitudinal data 
from the survey. Section 3 gives a general form of the models for symmetry in gross flow 
matrices with missing data and presents iterative procedures for obtaining maximum likelihood 
estimators (MLE’s) for the parameters of the models. Section 4 describes the fits of the models 
to data from the NCS. Section 5 presents conclusions and suggests areas for future research. 


2. THE NATIONAL CRIME SURVEY AND DATA 


2.1 Survey Design 


The NCS is a stratified, multi-stage, cluster sample of HU’s. The survey was begun in July 
1972 by the Law Enforcement Assistance Administration but has been administered by the 
Bureau of Justice Statistics since December 1979. The target population for the NCS is the 
civilian, non-institutionalized population of persons aged 12 and over living in housing units. 
The survey provides information on personal and household crimes committed against the indi- 
viduals in sampled HU’s. The following crimes and attempted crimes are covered by the NCS: 
assault, auto or motor vehicle theft, burglary, larceny, rape, and robbery. Crimes not covered 
by the survey include kidnapping, murder, shoplifting, and crimes that occur at places of business. 

The NCS uses a rotating panel design under which a sampled HU is maintained in the sample 
for three and one-half years with interviews conducted at six-month intervals for a total of seven 
possible interviews. The initial interview at each HU, however, serves as a bounding interview 
and is not used for the purpose of estimation. Although there is a six-month interval between 
interviews at any one HU, NCS interviews are conducted in every month of the year; in order 
to make efficient use of trained interviewers, one-sixth of the HU’s in the sample are scheduled 
for interviews each month. Since the sampling unit for the NCS is the HU, no attempt is made 
to follow individuals who move away from the HU during the three-and-one-half-year period. 
Rather, new individuals entering the HU are included in the survey. Each different group of 
individuals who live in a HU during its time in the NCS sample is considered a separate HH. 

NCS interviews are conducted for all individuals 12 years of age or older who live in the 
sampled HU at the time of the interview. During the interview, individuals are asked about 
crimes committed against them or against the household in the previous six months. A single 
HH respondent is asked a series of six screening questions to elicit information on crimes com- 
mitted against the HH (burglary, larceny, and motor vehicle theft). Then an eleven-question 
screener is used to elicit information from each individual in the HH concerning personal crimes 
committed against that individual (assault, rape, and robbery). An incident report is completed 
for each crime mentioned in response to the screening questions. 

Additional information on the design and history of the NCS is provided, for example, by 
the U.S. Department of Justice and Bureau of Justice Statistics (1981), Saphire (1984), Dodge 
and Skogan (1987), and Montagliani (1987). A new sample design for the NCS has been used 
since January 1986. Taylor (1987) describes the redesign of the NCS and research associated 
with the redesign effort. The data used in this work, however, were collected under the original 
NCS design. 


2.2 The Longitudinal Data 


The data used in this work are from a large, longitudinal data set which includes all the 
regular NCS interview information collected from January 1975 to June 1979 except for the 
HU’s that rotated into the sample in 1979. To make it easier to handle the data, this research 
uses only a subset of the data. The subset was created by taking a random start at the record 
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for the eighth HU in the full data set and then every fifteenth record after that. The resulting 
data set contains NCS records for 12,432 HU’s. Because the HU’s on the original longitudinal 
file are ordered in such a way that units from the same cluster appear together, the 1-in-15 
systematic sample should not include two or more HU’s from a single cluster. Thus, this research 
does not consider the problem of correlations among HU’s within clusters. 


2.3 Flows Among Victimization Classifications 


The hierarchical, longitudinal data were used to create summary matrices for the years 1975, 
1976, 1977, and 1978 showing flows among reported victimization classifications from each 
HH’s first interview in a year to the HH’s second interview for the year. Note that, since NCS 
interviews are conducted every month of the year, the first interview may occur at any time 
from January through June and the second interview may occur in July through December. 
Depending on the month of the interview, the victimizations reported in the first interview are 
those that occurred between the previous July and May while those reported in the second inter- 
view occurred between January and November. Thus, the analysis here explores only the 
reporting of crimes from one interview to the next. It cannot, for example, address issues of 
change in victimization reporting at various times of the year except in a very general sense. 

It should be noted that during the time when the data were collected, a reference-period 
experiment was conducted using a sample of NCS HU’s. Since individuals in HU’s included 
in the experiment were asked to report victimizations for reference periods other than the usual 
six-month period, those HU’s were not used in this analysis. 

For the analyses here, each HH interviewed at least once during a given year was classified 
according to its reporting and victimization status at the two interview times. A victimization 
may have been reported by any member of the HH and may be against an individual or against 
the HH. Two sets of matrices showing victimization classifications are used in the analyses 
of Section 4. The matrices are given in Appendix I. 

The first set of matrices show cross-classifications of HH’s by the number of victimizations 
reported in the first and second interviews for each year. The classifications are: crime free 
(no victimizations reported), single crime (one victimization reported), multiple crime (two or 
more victimizations reported), and missing (HH did not respond or rotated out of the sample). 
The second set of matrices show cross-classifications of HH’s by the type of victimization 
reported. The classifications are: crime free, property crime (burglary, larceny, and motor 
vehicle theft), contact crime (rape, assault, robbery, purse snatching, and pocket picking), and 
missing. These type-of-crime groupings are the same as those used in the NCS. In cases where 
multiple crimes were reported by a single HH, the classification used is for the most serious 
crime reported (contact crimes are taken to be more serious than property crimes). 

Notice the large amount of nonresponse in the observed matrices shown in Appendix I. Only 
about 50% of the HH’s who responded in at least one of the two interviews responded at both 
interview periods. The models presented in the following section, will allow us to handle this 
nonresponse while exploring the structure of the underlying matrix of probabilities of flows 
among the victimization classifications. 


3. THE MODELS 


This section presents a general form of the models that will be used to explore gross flows 
among victimization classifications in the NCS data. The form of the models follows that pro- 
posed by Chen and Fienberg (1974) for contingency tables with completely and partially 
classified data. The models for nonresponse are those developed by Stasny (1986) as well as 
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a model for random nonresponse. The model for symmetry in the flows, however, does not 
appear in the previous work. The models are presented in a general form because they are 
applicable to problems other than estimating gross flows among victimization classifications 
using NCS data. 


3.1 Model for the Observed Data 


Consider observation units that respond to a survey in at least one of two interview periods. 
Suppose that, when a unit responds to the survey, that unit is classified into one of K classifica- 
tions. If a unit does not respond to the survey, that unit is classified as missing. Then the 
interview-to-interview flow data may be represented as in Table 1. 


Table 1 
Summary of Observed Data 
Time 2 
1 Z bats K Missing 

1 X1M 

Time Z X2uM 
; 4 i 
K XKM 
Missing Xm Xm2 re XMK : 


where x;; = number of units with survey or missing status / at time 1 and / at time 2. 


We suppose that each unit would fall into one of the cells of the K x K matrix of survey 
classifications if it were observed at both interview times. Let p;; be the probability that a unit 
has status ij at time 1 and status / at time 2, where / and j take on the values 1, 2, ..., K. Each 
unit in the (i,/) cell of the matrix of survey classifications has a chance of being missing at 
one of the two survey times. Let \,;; be the probability that a unit in the (/,/) cell loses its 
classification at time ¢ and, hence, is classified as missing at that time. Then the probabilities 
underlying the observed data are as shown in Table 2. 


Table 2 
Probabilities Underlying Observed Data 
Time 2 
1 Ps 3K K Missing 
ee : 
ime 
1 { 5, pidsi} 
K i 


Missing 
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Assuming that the p, are probabilities from a multinomial distribution, the likelihood 
function for the observed data is proportional to 


{fi [ Bom 
<{fi[ Bo] 


There are 3K* + 2K — 1 free parameters defined above and only K* + 2K observed cells of 
data with a single constraint on the total sample size. Thus there are too many parameters to 
estimate using the observed data and we must reduce the number of parameters in the model. 
In the following we reduce the number of parameters to be estimated by considering two models 
for the p;;-parameters and six models for the ),;;-parameters. 


3.2 Models for the p and \ Probabilities 


We consider two models for the p;;’s, the probabilities of flows among survey classifica- 
tions: the unconstrained model and the model of symmetric flows. Under the model of 
unconstrained flow probabilities, there is a different probability, Pi, for every (i,j) cell of 
the flow matrix. Under the model of symmetric flows, we have Di; = p;;fori # jso that the 
probability that a unit has survey classification i at time 1 and / at time 2 is the same as the 
probability that a unit has survey classification j at time 1 and i at time 2. Note that symmetry 
in the cell probabilities of the flow matrix implies equality of row and column marginal totals. 
Thus the model of symmetry in flow probabilities implies a certain stability in the population 
since the expected number of units with a particular survey classification at time 1 is the same 
as the number with that classification at time 2. 

As defined above, the ,,;’s, the probabilities that units with survey classifications i at time 
1 and / at time 2 are missing at time t, depend on the time at which the nonresponse occurs 
and on the survey classifications at both times 1 and 2. We consider six simpler models for 
these probabilities. These models, along with the associated degrees of freedom under both 
models for the p,;, are given below: 


d.f. unconstrained pj; d.f. symmetric p; 
Model R: Aji; = A, 2ke— I (K? + 3K — 2)/2 
Model A: Ayij = Muy, Aa = Aris 0 (K* — K)/2 
Model B: yi; = Aj; 2K — 2 (K* + 3K — 4)/2 
Model C: yj; = Aj, Aay = is K (K* + K)/2 
Model D: \yj; = Aus, Aoi = Ayys 0 (K* — K)/2 


Model E: Aq; = Aj, Moy = Aj; K (K* + K)/2 
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Model R is the model of random nonresponse. Under Model R, there is a single probability 
of nonresponse for all units at both times regardless of survey classification. Under Model A, 
the probability that a unit is missing at time ¢ depends on both the time and the survey classifica- 
tion at the time when the unit responds. Note that if Model A is used for the \-parameters and 
the unconstrained model is used for the p,;, then the model is a saturated model which will 
fit the data exactly. Under Model B, the probability that a unit is missing at time ¢ depends 
only on the time. Under Model C, the probability that a unit is missing at time ¢t depends only 
on the unit’s survey classification at the time when the unit responds. Under Model D, the pro- 
bability that a unit is missing at time ¢ depends on both the time and the survey classification 
at the time when the unit is missing. If Model D is used for the \-parameters and the 
unconstrained model is used for the p;;, then the model is a saturated model which will fit the 
data exactly. Under Model E, the probability that a unit is missing at time ¢ depends only on 
the unit’s survey classification at the time when the unit is missing. 

Under Model R, nonresponse is said to be completely at random. Under Models A, B, and 
C, nonresponse is said to be ignorable nonresponse in that the nonresponse mechanism depends 
only on the observed data. Nonresponse under Models D and E is nonignorable nonresponse 
since the nonresponse mechanism depends on the missing data. (See Little and Rubin (1987) 
for more information on the types of nonresponse.) 

In the following two subsections, we describe procedures for fitting the models presented 
above. The fits of the models can be assessed using either the Pearson X” statistic or G, the 
likelihood ratio statistic. Both statistics have asymptotic x7 distributions, with degrees of 
freedom as shown above, given that the model is correct. In the following we use the notation 
‘‘Model R-U”’ to denote the pairing of Model R for the \-parameters and the unconstrained 
model for the p;;. ‘Model R-S’’ will denote the pairing of Model R for the \-parameters and 
the symmetric model for the p;;. Similar notation will be used to denote the pairings of Models 
A, B, C, D, and E for the \-parameters with one of the two models for the p;;. 


3.3 Estimation of the p and \ Parameters Under Models R, A, B, and C 


The likelihood functions for the eight models created using one of the two models for the 
pj; and Model R, A, B, or C for the },;; factor into two pieces: one piece a function of the 
p-parameters alone and one a function of the \-parameters alone. Thus, the MLE’s may be 
found separately for the two sets of parameters. In addition, the p-parameter estimates do not 
depend on which of these four models is used for the \-parameters, and the \-parameter 
estimates do not depend on which of the two models is used for the p-parameters. 

An iterative procedure for obtaining MLE’s for the p-parameters under the unconstrained 
model paired with Model R, A, B, or C for the \-parameters is given in Chen and Fienberg 
(1974). The equations for this procedure are provided in Appendix II. 

Under the symmetric model for the p-parameters paired with Model R, A, B, or C for the 
A-parameters, the factor of the likelihood equation involving only the pj;;’s is as follows: 


k k 
‘ {TT 2} x 112} Q) 
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where a dot in a subscript indicates summation over that subscript. Equation (1) is maximized 
subject to the constraint that the sum of the p;;’s is one. In general, an iterative procedure is 
required to obtain the MLE’s. Let x.. = Y£1Y¥ ES 1X; be the total number of units observed 
at both times and let m = x.. + x.y + X,y. be the total number of units observed in at least 
one of the two interview times. Then the iterative procedure used in the data analysis reported 
in Section 4 is as follows: 


Iterative Procedure for Estimating Symmetric p,; Under Models R, A, B, and C 


0 
Le Ps y= Nil X-- 


ps = (xy + Xj) /2x.. for iA j. 


aay ap OP a ees [ii + (Xin + Xi) Pipe” \/n 


pir’ = [ (xi + Ki) ae (Xim =e Xmi) Pi} P\.” = (Xjm ap Xj )Pis”|pj.?| | 2n for +i a 


Step 2 is repeated for vy = 0, 1, 2, ... until the parameter estimates converge to the desired 
degree of accuracy. The initial estimates given in step 1 are merely suggested estimates. Other 
positive values satisfying the constraint that the pj's Sum to one may be used. 

An iterative procedure for obtaining MLE’s for the \-parameters under Model A and the 
closed-form estimator for the \-parameters under Model B are given in Chen and Fienberg 
(1974). An iterative procedure for obtaining MLE’s for the \-parameters under Model C is 
given in Stasny (1986). The equations for these procedures are provided in Appendix II. 

Under Model R for the \-parameters, the factor of the likelihood equation involving only 
d is as follows: 


{I iv (1 - ayn} x {114 x {Tr}. 


j=l 
The closed-form MLE for ) is 


X = (X. + Xy. )/ 2n. 


3.4 Estimation of the p and \ Parameters Under Model D 


The likelihood functions for the observed data under either Model D-U or Model D-S cannot 
be factored and all parameter estimates must be obtained simultaneously. An iterative pro- 
cedure for obtaining MLE’s under Model D-U is given in Stasny (1988). The equations for 
this procedure are provided in Appendix II. Under Model D-S, the likelihood function for the 
observed data is as follows: 


i=1 j=i+ 


{Tr} x (TI TT 75°} x (TI eg x (TI Tl leah 9} 
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Equation (2) is maximized subject to the constraint that the sum of the p,;’s is one. In 
general, an iterative procedure is required in order to obtain the MLE’s. The iterative procedure 
used in the data analysis reported in Section 4 is as follows: 


Iterative Procedure for Estimating Parameters Under Model D-S 
le Daw = X;j;/X.. 


(0) 
Pij 


(xij a6 Xj eXe. for i # J 


hy? = x.y/n. 


K K 
2. pt) = nh + Xim [aca se pans? | + Xi jvm | yD pine} 


h=1 h=1 


pee = (2m) xy + xj; + vars 9” [5 Sen pans | 


h=1 


K K 
+ sad mids | 5 Bins | + Xui [srry | 2 pans? 


=1 


> 


K 
+ Xuj Casie Be ne} pense 
K K K 
wud e 2 \e jswe iY | > DN aie Lk /Q ns dh? = d2”) ] 
n=) 


K K K 
w= YP E XimDi | 2 pans? | ial NPE 
[il i=1 


l= 


—_ 


Step 2 is repeated for vy = 0, 1, 2, ... until the parameter estimates converge to the desired 
degree of accuracy. The initial estimates given in step 1 are merely suggested estimates. Other 
values between zero and one satisfying the constraint that the p,;;’s sum to one may be used. 


3.5 Estimation of the p and \ Parameters Under Model E 


The likelihood functions for the observed data under either Model E-U or Model E-S cannot 
be factored and all parameter estimates must be obtained simultaneously. An iterative pro- 
cedure for obtaining MLE’s under Model E-U is given in Stasny (1988). The equations for this 
procedure are provided in Appendix II. Under Model E-S, the likelihood function for the 
observed data is as follows: 


Survey Methodology, December 1990 313 


{Ue} « (10 Ta ve} « (OE Le} = (10 Ts 


i=1 j=it+l] i=2 j=l at ae 
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Equation (3) is maximized subject to the constraint that the sum of the p,;’s is one. In 
general, an iterative procedure is required in order to obtain the MLE’s. The iterative procedure 
used in the data analysis reported in Section 4 is as follows: 


1-— i; - i 

K (kK es. 

II ys pir | i (3) 
i=l 


j=l 


Iterative Procedure for Estimating Parameters Under Model E-S 


leh ones del ose 


Pi = (xj + Xji)/2x.. for i AJ 


Af = (xy. + Xu) /2n. 


K 
2. pyr) = nh + (Xim + Xmi) E pine | ae pr] 


K 
nvr? = fy + ar + no [An] 3 ws 


K 
+ (Xu + Xu) E pin] Ep pen |] for ix j 


K K 
ner) = — 2 E “iM 4 aug) 9M] De 0 | 


K 
oat DD ANCA S a7) (Eee ee 


j=1 


Step 2 is repeated for vy = 0, 1, 2, ... until the parameter estimates converge to the desired 
degree of accuracy. The initial estimates given in step 1 are merely suggested estimates. 
Other values between zero and one satisfying the constraint that the p,;’s sum to one may be 
used. 
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4. FITS OF THE MODELS TO NCS DATA 


The models described in Section 3 were fit to the NCS data described in Section 2. Recall 
that the NCS data for each of the years from 1975 to 1978 is summarized both by number of 
crimes reported in each of the two interviews during the year and by the type of crime reported. 
Since three survey classifications are used, we have K = 3. Standard errors of the parameter 
estimates were obtained using the observed information matrix. 


Table 3a 


Estimates of p;; for Flows Among Number-of-Crime Classifications 
Under Models R, A, B, and C 


Unconstrained Model Symmetric Model 
Second Interview 


Crime Single Multiple Crime Single Multiple 


Free Crime Crime Free Crime Crime 
1975 
Crime Free .666 .098 .029 .666 .102 .032 
First (.0075) (.0050) (.0031) (.0075) (.0035) (.0022) 
Single Crime .106 .029 .014 .102 .029 .012 
interview (.0051) (.0031) (.0023) (.0035) (.0031) (.0015) 
Multiple Crime .036 O11 .012 .032 .012 .012 
(.0032) (.0021) (.0021) (.0022) (.0015) (.0021) 
1976 
Crime Free .669 101 .029 .669 .099 .030 
First (.0076) (.0052) (.0033) (.0076) (.0036) (.0022) 
Single Crime .098 .034 .014 .099 .034 .014 
Interview (.0051) (.0034) (.0025) (.0036) (.0034) (.0017) 
Multiple Crime .031 .014 O11 .030 .014 .010 
(.0030) (.0023) (.0022) (.0022) (.0017) (.0022) 
1977 
Crime Free .670 LS .032 .671 .103 .030 
First (.0079) (.0058) (.0034) (.0079) (.0037) (.0023) 
Single Crime .092 .026 .016 .103 .026 .016 
Interview (.0051) (.0032) (.0026) (.0037) (.0032) (.0018) 
Multiple Crime .028 .016 .006 .030 .016 .006 
(.0030) (.0026) (.0017) (.0023) (.0018) (.0017) 
1978 
Crime Free .671 .097 .027 .671 .105 .027 
First (.0087) (.0062) (.0035) (.0087) (.0043) (.0025) 
Single Crime gt .032 .009 .105 .032 .010 
Interview (.0061) (.0040) (.0022) (.0043) (.0040) (.0017) 
Multiple Crime .027 .013 .013 .027 .010 .013 


(.0034) (.0027) (.0026) (.0025) (.0017) (.0026) 


Note: Estimated standard errors are given in parentheses. 
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4.1 Estimates of the p-Parameters Under Models R, A, B, and C 


Recall that the p-parameter estimates do not depend on the nonresponse mechanism under 
Models R, A, B, and C. For the iterative procedures used to estimate the p;; under both the 
unconstrained and symmetric models, the criterion used for stopping the iteration was that 
the expected counts in the (i,j) cell of the flow matrix, nf;,;, differed by no more than 0.5 
from one step of the iterative procedure to the next. In all cases, convergence occurred rapidly, 
taking at most six steps. The estimates of the p;; when HH’s are classified by numbers of 
crimes reported are given in Table 3a for both the unconstrained and symmetric models. The 
estimates of the p;; when HH’s are classified by types of crimes reported are given in Table 
4a for both the unconstrained and symmetric models. 


Table 3b 


Estimates of p;; for Flows Among Number-of-Crime Classifications 
Under Models D-S 


Symmetric Model 
Second Interview 


Crime Single Multiple 


Free Crime Crime 
1975 
Crime Free .638 .106 .035 
First (.0104) (.0047) (.0029) 
Single Crime .106 .033 .015 
Interview (.0047) (.0039) (.0019) 
Multiple Crime .035 .015 .016 
(.0029) (.0019) (.0027) 
1976 
Crime Free .645 .100 .034 
First (.0100) (.0045) (.0029) 
Single Crime .100 .037 .017 
Interview (.0045) (.0041) (.0021) 
Multiple Crime .034 .017 .015 
(.0029) (.0021) (.0029) 
1977 
Crime Free .642 .106 .033 
First (.0109) (.0054) (.0032) 
Single Crime .106 .031 .021 
Interview (.0054) (.0043) (.0023) 
Multiple Crime .033 .021 .009 
(.0032) (.0023) (.0025) 
1978 
Crime Free .636 114 .028 
First (.0118) (.0056) (.0029) 
Single Crime .114 .040 .013 
Interview (.0056) (.0051) (.0021) 
Multiple Crime .028 .013 .015 
(.0029) (.0021) (.0030) 


Note: Estimated standard errors are given in parentheses. 
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Notice in both Tables 3a and 4a that the flow matrices of estimated probabilities under the 
unconstrained model for the p;; appear to be fairly symmetric so that the model of symmetry 
in the flows is suggested as a reasonable model to consider. Also notice that the estimates 
of the p;; do not appear to change much over the four years. The fits of these two models for 
the p;; will be considered for each of the four models for nonresponse in Subsection 4.4 
below. 


Table 3c 


Estimates of p;; for Flows Among Number-of-Crime Classifications 
Under Models E-U and E-S 


Unconstrained Model Symmetric Model 
Second Interview 


Crime Single Multiple Crime Single Multiple 


Free Crime Crime Free Crime Crime 
1975 
Crime Free .639 .102 .031 .639 .106 .035 
First (.0104) (.0061) (.0037) (.0104) (.0047) (.0028) 
Single Crime 110 .033 .016 .106 .033 .015 
Interview (.0061) (.0039) (.0026) (.0047) (.0039) (.0019) 
Multiple Crime .039 .014 .016 .035 .015 .016 
(.0039) (.0025) (.0027) (.0028) (.0019) (.0027) 
1976 
Crime Free .645 .103 .032 .645 101 .033 
First (.0100) (.0063) (.0041) (.0100) (.0045) (.0029) 
Single Crime .098 .037 .017 101 .037 .017 
Interview (.0057) (.0041) (.0030) (0045)  (.0041) (.0021) 
Multiple Crime .035 .017 .016 .033 .017 .016 
(.0037) (.0027) (.0029) (.0029) (.0021) (.0029) 
1977 
Crime Free .636 .124 .037 .642 .106 .033 
First (.0112) (.0083) (.0050) (.0110) (.0055) (.0033) 
Single Crime .094 .031 .021 .106 .030 .020 
Interview (.0060) (.0043) (.0031) (.0055) (.0043) (.0023) 
Multiple Crime .029 .020 .008 .033 .020 .008 
(.0036) (.0031) (.0024) (.0033) (.0023) (.0025) 
1978 
Crime Free .639 .106 .029 .637 112 .028 
First (.0118) (.0078) (.0042) (.0118) (.0055) (.0029) 
Single Crime ally .041 O11 Abe? .041 .013 
Interview (.0070) (.0051) (.0026) (.0055) (.0051) (.0021) 
Multiple Crime .027 .016 .015 .028 .013 .015 


(.0037) (.0032) (.0030) (.0029) (.0021) (.0030) 


Note: Estimated standard errors are given in parentheses. 
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Table 4a 


Estimates of p;; for Flows Among Type-of-Crime Classifications 
Under Models R, A, B, and C 


Unconstrained Model Symmetric Model 
Second Interview 


Crime Property Contact Crime Property Contact 


Free Crime Crime Free Crime Crime 
1975 
Crime Free .666 .105 .022 .666 111 .024 
First (.0075) (.0053) (.0026) (.0075) (.0037) (.0018) 
Property Crime 118 .044 .010 11 .044 .008 
Interview (.0054) (.0038) (.0019) (.0037) (.0038) (.0013) 
Contact Crime .025 .007 .004 .024 .008 .004 
(.0026) (.0016) (.0012) (.0018) (.0013) (.0012) 
1976 
Crime Free .669 .108 .023 .669 .108 .022 
First (.0076) (.0055) (.0028) (.0021) (.0011) (.0010) 
Property Crime .108 .047 .010 .108 .047 O11 
Interview (.0053) (.0040) (.0021) (.0011) (.0019) (.0009) 
Contact Crime .021 .012 .002 .022 O11 .002 
(.0025) (.0021) (.0011) (.0010) (.0009) (.0012) 
1977 
Crime Free .670 .128 .019 .671 115 .018 
First (.0079) (.0061) (.0026) (.0078) (.0039) (.0018) 
Property Crime .103 .041 .008 dS .041 .008 
Interview (.0053) (.0039) (.0018) (.0039) (.0040) (.0014) 
Contact Crime .016 .008 .006 .018 .008 .006 
(.0025) (.0021) (.0018) (.0018) (.0014) (.0017) 
1978 
Crime Free .671 .104 .019 .671 JY .019 
First (.0087) (.0064) (.0031) (.0088) (.0044) (.0021) 
Property Crime .119 .040 .010 BA Be .040 .010 
Interview (.0063) (.0044) (.0024) (0044) (.0044) (.0017) 
Contact Crime .019 011 .006 .019 .010 .006 
(.0029) (.0025) (.0020) (.0021) (.0017) (.0020) 


nee ee 


Note: Estimated standard errors are given in parentheses. 
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Table 4b 


Estimates of p;; for Flows Among Type-of-Crime Classifications 
Under Models D-S 


Symmetric Model 
Second Interview 


Crime Property Contact 


Free Crime Crime 
1975 
Crime Free .635 118 .026 
First (.0101) (.0046) (.0026) 
Property Crime 118 L052 011 
Interview (.0046) (.0046) (.0016) 
Contact Crime .026 011 .005 
(.0026) (.0016) (.0016) 
1976 
Crime Free .641 .110 .026 
First (.0098) (.0046) (.0028) 
Property Crime 110 .052 .015 
Interview (.0046) (.0048) (.0021) 
Contact Crime .026 .015 .004 
(.0028) (.0021) (.0019) 
1977 
Crime Free .642 .120 .019 
First (.0104) (.0052) (.0024) 
Property Crime .120 .0S0 O11 
Interview (.0052) (.0049) (.0019) 
Contact Crime .019 O11 .008 
(.0024) (.0019) (.0022) 
1978 
Crime Free .636 el2) .020 
First (.0117) (.0057) (.0025) 
Property Crime 121 .049 .012 
Interview (.0057) (.0055) (.0021) 
Contact Crime .020 .012 .008 
(.0025) (.0021) (.0025) 


Note: Estimated standard errors are given in parentheses. 
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Table 4c 


Estimates of p;; for Flows Among Type-of-Crime Classifications 
Under Models E-U and E-S 


Unconstrained Model Symmetric Model 
Second Interview 


Crime Property Contact Crime Property Contact 


Free Crime Crime Free Crime Crime 
1975 
Crime Free .636 sh hilith .024 .636 .117 .026 
First (.0100) (.0062) (.0034) (.0101) (.0046) (.0026) 
Property Crime .124 .053 .012 Piet?) .0S2 O11 
Interview (.0063) (.0047) (.0023) (.0046) (.0047) (.0016) 
Contact Crime .027 .009 .005 .026 O11 .005 
(.0033) (.0020) (.0016) (.0026) (.0016) (.0016) 
1976 
Crime Free .641 .110 .028 .641 .110 .026 
First (.0098) (.0065) (.0041) (.0098) (.0046) (.0028) 
Property Crime .110 051 .014 110 .052 .015 
Interview (.0059) (.0048) (.0028) (.0046) (.0048) (.0021) 
Contact Crime .024 .016 .005 .026 .015 .005 
(.0033) (.0028) (.0019) (.0028) (.0021) (.0019) 
1977 
Crime Free .636 .138 .023 .641 it .019 
First (.0108) (.0076) (.0035) (.0105) (.0051) (.0024) 
Property Crime .107 .050 .010 121 .049 O11 
Interview (.0060) (.0048) (.0022) (.0051) (.0048) (.0018) 
Contact Crime .015 011 .009 .019 O11 .009 
(.0028) (.0027) (.0023) (.0024) (.0018) (.0022) 
1978 
Crime Free .641 111 .022 .640 .118 .021 
First (.0117) (.0078) (.0040) — (.0117) (.0056) (.0026) 
Property Crime .124 .048 .012 118 .048 .013 
Interview (.0071) (.0055) (.0029) (.0056) (.0054) (.0021) 
Contact Crime .020 .014 .009 .021 .013 .008 
(.0033) (.0031) (.0025) (.0026) (.0021) (.0025) 


Note: Estimated standard errors are given in parentheses. 
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4.2 Estimates of the \-Parameters Under Models R, A, B, and C 


Recall that the \-parameter estimates under Models R, A, B, and C are the same regardless 
of whether the unconstrained or symmetric model is used for the p-parameters. For the iterative 
procedures used to estimate the \-parameters under Models A and C, the convergence criterion 
used was that estimates of the \-parameters differed by no more than .0005 from one step 
to the next. Convergence took between 41 and 4150 steps when it occurred in fewer than 
10,000 steps after using the initial parameter estimates suggested in Appendix II. The factors 
of the likelihood for the observed data involving only the \-parameters were, in some cases, 
not well behaved. This is particularly true for the likelihoods for the 1978 data under both 
Models A and C. In such cases, a grid search was used to locate appropriate starting points 
for the iterative procedures. A rough grid search was also used in all cases to verify that, when 
the iterative procedure converged, it appeared to have converged to a global rather than a local 
maximum. 

The estimates of the \-parameters under both the number-of-crimes and type-of-crime 
classifications for Models R, A, B, and C are given in Tables 5, 6, 7, and 8 respectively. 

Notice that under Models R and B the estimates of the \-parameters are the same for 
both the number-of-crimes and type-of-crime classifications because the probability of being 
a nonrespondent under those two models does not depend on survey classification. Under 
Models A and C, the \-parameter estimates corresponding to the crime-free classification 
are the same, within rounding error, for both the number-of-crimes and type-of-crime 
classifications since crime-free HH’s are the same under both classifications. Also notice that, 
under Models A and C, the \-parameter estimates, the estimated probabilities of being a 
nonrespondent, generally increase as the number of victimizations or the seriousness of the 
crime increases. 


Table 5 
Estimates of \ Under Model R 


Number-of-Crimes 
or Type-of-Crime 
Classification of Data 


IN 


1975 .224 
(.0035) 


1976 DEY? 
(.0035) 


177 23% 
(.0036) 


1978 .250 
(.0040) 


Note: Estimated standard errors are given in parentheses. 
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Table 6 
Estimates of \;; and \7; Under Model A 


Number-of-Crimes Classification of Data Type-of-Crime Classification of Data 


a A A a A A A A A 


Ant Ai2 33 Ar A22 23 A M12 Ay3 Xa Xoo Xy3 


1975 .208 dl Dyieye Dad 221 wed 4s TZ I15 .208 ASO) eye D 2d meneiee Opin. 240; pete 46 
(.0062) (.0159) (.0261) (.0064) (.0147) (.0242) (.0062) (.0151) (.0321) (.0064) (.0139) (.0303) 


L926 ie LOG e261 Si 397 CA Z3OE bE 25 AF DOTS ee. 206 fei 2I 8, ond 81 1235 bao 285 
(.0063) (.0152) (.0268) (.0066) (.0153) (.0248) (.0063) (.0146) (.0327) (.0066) (.0144) (.0319) 


197.74) ..192 .263 .309 258 281 20 4192; cae 275 267 258 Oden BALD 
(.0064) (.0152) (.0265) (.0070) (.0171) (.0285) (.0064) (.0144) (.0327) (.0069) (.0159) (.0369) 


1978" 20 TeGmeSI6E, ..302* 45-269", ..280%% «321%. 22077 Wy.305*. 1) 343% i 269*.. 22.2807 003348 
(.0072) (.0182) (.0308) (.0079) (.0176) (.0300) (.0072) (.0174) (.0364) (.0079) (.0166) (.0362) 


Note: * Indicates cases in which the likelihood function is not well behaved. 
Estimated standard errors are given in parentheses. 


Table 7 
Estimates of \; and \2 Under Model B 


Number-of-Crimes or Type-of-Crime Classification of Data 


A A 


Ny h2 
1975 228 .226 
(.0058) (.0058) 
1976 :225 .240 
(.0059) (.0060) 
1977 .209 .264 
(.0059) (.0064) 
1978 A772) PAE: 
(.0067) (.0071) 


Note: Estimated standard errors are given in parentheses. 


Table 8 
Estimates of \; Under Model C 
Number-of-Crimes Classification of Data Type-of-Crime Classification of Data 
Ay Xp X3 hy Xo X3 
1975 214 oz .300 214 .262 .284 
(.0039) (.0118) (.0199) (.0039) (.0109) (.0262) 
1976 224 57 .330 22h .266 1333 
(.0040) (.0116) (.0210) (.0040) (.0109) (.0289) 
1973. 225 Sap 317 225% 2735 $339* 
(.0041) (.0126) (.0235) (.0041) (.0115) (.0286) 
1978 237% 297% IZ p2a0* .292* .339* 
(.0046) (.0139) (.0236) (.0046) (.0130) (.0299) 


Note: * Indicates cases in which the likelihood function is not well behaved. 
Estimated standard errors are given in parentheses. 
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4.3 Parameter Estimates Under Models D and E 


Models D and E are more difficult to fit than Models R, A, B, and C because all parameters 
under Models D and E must be estimated simultaneously. For all sets of the NCS data, the 
likelihood functions under Models D and E were not well behaved and grid searches over 
the possible values of the \-parameters were required to locate suitable starting points for 
the iterative procedure. Since a grid search over the six \-parameters under Model D was 
extremely time-consuming, parameter estimates were obtained under Model D-S but not 
under Model D-U. Estimates of the p-parameters under Model D-S are given in Table 3b for 
the number-of-crimes classification and in Table 4b for the type-of-crime classification. 
The \-parameter estimates under Model D-S are given in Table 9 for both types of classifica- 
tions. Estimates of the p-parameters under Models E-U and E-S are given in Table 3c for the 
number-of-crimes classification and in Table 4c for the type-of-crime classification. The 
\-parameter estimates under Models E-U and E-S are given in Table 10 for both types of 
classifications. 

Notice that under Models D and E the estimates of p;,, the probability of remaining in 
the crime-free classification, are somewhat smaller that the corresponding estimates under 
Models R, A, B, and C; the estimates of the remaining p-parameters under Models D and E 
are somewhat larger than the corresponding estimates under Models R, A, B, and C. Under 
both Models D and E, the A-parameter estimates, the estimated probabilities of being a 
nonrespondent, generally increase as the number of victimizations or the seriousness of the 
crime increases. In the cases where the estimates decrease as the number of victimizations or 
the seriousness of the crime increases (in the 1978 data under Model D-S and in the 1978 number- 
of-crimes data under Model E-S), the decreases are small and within the estimated standard 
error of the estimates. 


Table 9 
Estimates of \; and \2; Under Model D-S 


Number-of-Crimes Type-of-Crime 
Classification of Data Classification of Data 


A Ay2 Ay3 21 X22 Xg3 An Ay2 Ai3 Xa Xn 23 


197500.210 .246 319 .194 5p .387 .208 .264 319 1192 339 SPs 
(.0085) (.0303) (.0368) (.0085) (.0282) (.0362) (.0084) (.0249) (.0523) (.0085) (.0235) (.0507) 


1976 .204 = .276 39 217 213 .444 .203 -280¥6 383 215 Pi2297 453 
(.0083) (.0274) (.0344) (.0084) (.0291) (.0331) (.0083) (.0244) (.0443) (.0084) (.0255) (.0416) 


1977 ~=.175 .307 .380 .249 .298 .374 se 304 .438 248 315 341 
(.0086) (.0301) (.0403) (.0089) (.0326) (.0439) (.0086) (.0243) (.0424) (.0089) (.0259) (.0491) 


19785.) .211 .278 .290 .236 413 384 A & | .276 3293 .236 411 391 
(.0094) (.0282) (.0433) (.0099) (.0261) (.0443) (.0094) (.0264) (.0563) (.0098) (.0246) (.0567) 


Note: Estimated standard errors are given in parentheses. 
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Table 10 
Estimates of \; Under Model E 
Number-of-Crimes Classification of Data Type-of-Crime Classification of Data 
Unconstrained pj; 
1975 .202 285 .348 201 302 336 
(.0060) (.0235) (.0262) (.0058) (.0180) (.0418) 
1976 as Wt GA .387 .209 286 419 
(.0057) (.0226) (.0232) (.0056) (.0193) (.0327) 
1977 .210 SLD ey .209 .318 394 
(.0063) (.0259) (.0351) (.0061) (.0183) (.0295) 
1978 .224 .340 .342 [pape 326 385 
(.0065) (.0208) (.0296) (.0065) (.0203) (.0333) 
Symmetric pj; 
1975 .202 285 351 .201 301 341 
(.0060) (.0235) (.0258) (.0059) (.0180) (.0408) 
1976 2¥I .274 389 .209 .287 418 
(.0057) (.0223) (.0229) (.0056) (.0191) (.0327) 
1977 ANE 301 .376 213 309 391 
(.0061) (.0267) (.0339) (.0060) (.0190) (.0302) 
1978 .224 343 338 Zo 329 .379 
(.0065) (.0204) (.0298) (.0065) (.0199) (.0339) 


Note: Estimated standard errors are given in parentheses. 


4.4 Fits of the Models 


Table 11 shows the X* and G? values and the associated degrees of freedom for all twelve 
models (including Model D-U which must fit the data exactly) and both types of survey 
classifications. Note that the models were fit as an illustration of the methods developed here 
and we have ignored the complex survey design. Although clusters are not a problem in our 
subsample of the NCS data, in a more complete analysis we would prefer to fit the models 
separately to data from different strata and then combine the strata estimates to obtain estimates 
for the entire population. 

Clearly, neither Model R, the model of random nonresponse, nor Model B, under which 
the probability of nonresponse depends only on time, fits the data well for either the 
unconstrained or symmetric models for the pj;. 

Models C-U and C-S fit the 1975 data fairly well and give reasonable fits to the 1976 data. 
Since Model C-S fits the data reasonably well and is a more parsimonious model, we prefer 
it over Model C-U. Under Model C, the probability of nonresponse depends only on the vic- 
timization classification at the interview in which the HH responded, not on the time. Thus, 
Model C is the model of symmetry in the nonresponse probabilities for the two interview 
periods. When Model C is paired with the symmetric model for the p-parameters, we obtain 
symmetric expected cell counts for the observed flow data. Notice in the observed data shown 
in Appendix I, that in 1977 and 1978 there is much more nonresponse at the second interview 
time than at the first interview time. This difference in nonresponse rates is the reason for the 
lack of fit of Model C to the 1977 and 1978 data. 
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Table 11 
Fits of the Models 


Number-of-Crimes Classification of Data Type-of-Crime Classification of Data 


Unconstrained p;; Symmetric pj; Unconstrained p;; Symmetric pj; 
x2 G2 x2 G2 x2 G2 x2 G2 
Model R (d.f. = 5) (d.f. = 8) (d.f. = 5) (d.f. = 8) 
1975 42.7 41.2 45.9 AS.60U38.2 36.9 42.0 41.5 
1976 70.2 67.1 69.7 Glo 4572 55.9 58.3 56.4 
1977 74.2 TISWIA 83.9 85:3... 85.4 84.8 94.8 95.3 
1978 61.7 62.7 64.9 66.3 63.2 64.1 65.5 66.8 
Model A (d.f. = 0) (d.f. = 3) (d.f. = 0) (d.f.. = 3) 
1975 0.0 0.0 4.4 4.4 0.0 0.0 4.6 4.6 
1976 0.0 0.0 0.6 0.6 0.0 0.0 0.5 0.5 
1977 0.0 0.0 10.1 10.1 0.0 0.0 10.5 10.5 
1978 0.0 0.0 3u7 shh 0.0 0.0 oer ae | 
Model B (d.f. = 4) (d.f.¢= 7) (d.f, = 4) (d:f.;=,7) 
1975 42.7 41.1 45.9 A555 38.2 36.9 42.0 41.5 
1976 69.1 64.5 68.5 (an lla at 74 Bins! 56.9 53.8 
1977 47.1 45.4 58.7 dao O70 54.9 68.4 65.4 
1978 47.6 46.0 50.1 49.6 49.1 47.4 50.7 50.1 
Model C (d.f. = 3) (d.f. = 6) (d.f. =°3) (d.f. = 6) 
1975 6.9 6.9 a3 Lic3 7.4 7.4 12.0 12.0 
1976 21.2 Zid 21.8 21.9% S115. 1 1501 15.6 15.6 
1977 38.1 38.3 48.2 48.4 45.6 45.7 56.0 56.3 
1978 3131 31.1 34.7 34.8 29.9 30.0 32.6 alah 
Model D (d.f. = 0) (d.f. = 3) (d.f. = 0) (d.f. = 3) 
1975 0.0 0.0 5.0 5.0) 0.0 0.0 5.6 5.6 
1976 0.0 0.0 1573 1523 0.0 0.0 11.6 11.6 
1977 0.0 0.0 £15 11.5 0.0 0.0 18.0 18.0 
1978 0.0 0.0 10.2 10.2 0.0 0.0 9.9 9.8 
Model E (d.f...=.3) (d.f. = 6) (d.f. = 3) (d.f. = 6) 
1975 7.0 1 11.3 Li.3 13 a3 12.0 12.0 
1976 21.0 pA 21.8 21.9 14.8 14.9 15.6 15.6 
1977 a0) 33.0 48.2 48.4 39.5 39.5 56.0 56.3 
1978 32.0 3221 34.6 34.8 30.9 31.0 32.6 A 


Note: x%99(3) = 11.34, x299(4) = 13.28, x799(5) = 15.09, x¥g9(6) = 16.81, x299(7) = 18.48, and x79 (8) 


The fits of Models E-U and E-S are quite similar to those of Models C-U and C-S respec- 
tively. This is not surprising since the interpretations of the model are quite similar. Under Model 
C nonresponse depends on the survey classification when the HH responds while under Model 
E it depends on the survey classification when the HH does not respond. Since the fits of these 
two models are similar, we cannot choose between the two models using the data alone. 
Logically, Model E seems more realistic since we might expect nonresponse to depend on the 
current victimization status. Since the two models provide similar fits to the data, it may be 
that the victimization status at the time when the HH responds is generally a good indicator 
for the victimization status when the HH does not respond. If that is the case, we would prefer 


to use Model C since it is easier to fit than Model E. 
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Model A-S, under which nonresponse depends on both the time and on the victimization 
status when the HH responds fits the 1975, 1976, and 1978 data very well and gives a reasonable 
fit to the 1977 data. The fits of Model D-S are similar to those of Model A-S with the exception 
of the 1976 data which is fit much better by Model A-S. Again we cannot choose between Model 
A and D based on the data alone. (Models A-U and D-U fit the data exactly.) In general, we 
are quite pleased with the fits of Model A-S to both the number-of-crimes and type-of-crime 
data from all four years. Since Model A provides a reasonable fit to all the data, we conclude 
that nonresponse in the NCS does depend on victimization status. 

Notice that, in most cases, the fits of the models as measured by X? and G? do not change 
much when the symmetric p;; model is used rather than the unconstrained p;; model. Since we 
gain 3 degrees of freedom going to the more parsimonious, symmetric model for the p;;, we 
prefer this model to the unconstrained model for the p;;. This choice of the symmetric model 
for the flow probabilities indicates that there is a certain amount of stability in victimizations 
reported in the first and second halves of the year in the NCS. This stability comes from the 
fact that symmetry in the underlying flow probabilities implies equality of marginal totals. Thus, 
the numbers of HH’s having no crimes, one crime, or two or more crimes remain about the 
same from the first interview of a year to the second year. Similarly, the numbers of HH’s 
having no crimes, a property crime, or a contact crime remain about the same from the first 
interview of a year to the second year. 


5. CONCLUSIONS AND FUTURE WORK 


We have seen that the model of symmetry in the matrices of flows among victimization 
classifications paired with a model under which nonresponse depends on both time and 
victimization status, provides a good fit to data summaries from the NCS. The same model 
fits the data when classification of HH’s is by number of crimes reported or by type of crime 
reported. 

The work described here is, of course, only an initial attempt to explore nonresponse and 
flows among victimization classifications in NCS data. For example, we noticed that the 
estimated symmetric probabilities of flows among the classifications did not appear to change 
much over the four-year period from 1975 to 1978 but the estimated probabilities of 
nonresponse did appear to change over this period. One might wish to fit a model to the NCS 
data which has constant flow probabilities but allows the nonresponse probabilities to change 
over time. If the nonresponse probabilities do actually change over time, not just from year 
to year but also from interview period to interview period, then it would be important to try 
to discover why these probabilities are changing. 

In the work presented here, all missing data were treated the same. In fact, data may be 
missing because a HU rotated out of the sample, because a HH moved into or out of the sampled 
HU, because no one was at home, because the HH refused to respond, or for some other reason. 
It may be reasonable to assume that data missing because a HU rotated out of the sample is 
missing at random, but that other types of nonresponse are not missing at random. Stasny (1988) 
presents models that allow for different types of nonresponse which could be used with the 
models of symmetry in flows presented here. In addition, the models here do not allow for 
HH’s which are missing at both interview periods. Since there are, of course, such HH’s, one 
may wish to explore Markov-chain model such as those given in Stasny (1987) which do handle 
nonresponse at both times. 
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Most importantly, one may want to consider more natural summaries of the data than 
were used here. The data used here were summarized by first and second interview for the 
year. A more meaningful summary would be, say, by month or quarter of the year. If such 
summaries were used, then the complex nature of the interview schedule for the NCS would 
have to be considered and accounted for in the models. For example, the response status for 
a HH would be the same for the six-month reporting period covered at any one interview 
time. The development of models taking this into account is an important area for future 
work. 
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APPENDIX I 
The Observed Data 
Classification by Number of Victimizations 
Second Interview 
Crime Single Multiple ee 
Free Crime Crime ne 
1975 
Crime Free 1963 256 67 901 
First Single Crime 306 73 31 179 
Interview Multiple Crime 95 26 24 83 
Missing 866 193 91 
1976 
Crime Free 1884 257 53 951 
First Single Crime 266 84 24 186 
Interview Multiple Crime 82 34 18 75 
Missing 831 197 106 
1977 
Crime Free 1742 260 66 994 
First Single Crime 228 56 31 Le, 
Interview Multiple Crime 63 31 10 76 
Missing 716 194 79 
1978 
Crime Free 1370 157 45 831 
First Single Crime 222 50 14 165 
Interview Multiple Crime 50 18 19 66 
Missing 651 174 a 
Classification by Type of Crime 
Second Interview 
Crime Property Contact re 
Free Crime Crime oe 
1975 
Crime Free 1963 271 52 901 
First Property Crime 331 107 22 217 
Interview Contact Crime 70 17 8 45 
Missing 866 225 59 
1976 
Crime Free 1884 266 44 951 
First Property Crime 295 111 19 PAW 
Interview Contact Crime 53 26 4 50 
Missing 831 235 68 
1977 
Crime Free 1742 283 43 994 
First Property Crime 262 89 18 194 
Interview Contact Crime 29 12 9 59 
Missing 716 231 42 
1978 
Crime Free 1370 173 29 831 
First Property Crime 238 64 14 184 
Interview Contact Crime 34 15 8 47 
Missing 651 184 47 
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APPENDIX II: Procedures for Obtaining MLE’s of the p and \ Parameters 


Note that v=") Sy Riixy is the total number of units responding at both times and 
n=x.. + X.y + Xy. is the total sample size. The starting values given below for the iterative 
procedures are merely suggested values. Other positive values summing to one may be used 
as initial values for the p-parameter estimates, and other values between zero and one may be 
used as initial values for the \-parameter estimates. 


MLE’s for Unconstrained p;;’s Under Models R, A, B, and C 
1. pi = xj/x.. 

= (/n™ W/n® 
oy Dire = [>i = x imMP jj [Pi. 3 XMjPij [Pj |/7- 


Step 2 is repeated for vy = 0, 1, 2, ... until the p;; parameter estimates converge to a desired 
degree of accuracy. 


MLE’s for \’s Under Model A 


1. A,° = xy./n and yy) = x.y/n. 


K 
2. a) in = vu 5 xii/(1 a4 Nive a hy) | 
i=l 


> 


b) +) 


sm peeerdaeetie ha 1: 
Tal 


Step 2 is repeated for vy = 0, 1, 2, ... until the \-parameter estimates converge to the desired 
degree of accuracy. If x;y > ¥ x {pp Olek ee K _,Xj, for some h, so that of all units re- 
sponding in a particular survey classification at one interview time more did not respond at 
the other interview time than did respond, then the corresponding parameter estimates will, 
at some step, fall outside of the 0 to 1 range and alternate formulas must be used in place of 
those given above (see Chen and Fienberg 1974). If for some j xy > Y is 1X;;, then for that 
j, step 2a) given above is replaced by 


Myr? = 1 Ay? — (P| Xn) if L | UR I poate MP)]} (1 te) 


where A is chosen at each step of the iteration so that \J = J”) for alli = 1, 2, ... K. 
If forsomeixy > ¥ fatOae then for that 7, step 2b) given above is replaced by 


ve tat?. fie NY = doy |. of dL | /Q = Nese? 4?) (1 = Mi a Ny?) 


where A is chosen at each step of the iteration so that \{") > he Ne forall j = 1,2, ... K. 
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MLE’s for \’s Under Model B 


A 


hi = Xy-/n and Ay =x y/n. 
MLE’s for \’s Under Model C 
ihe d = (Xim an Xyi)/2n. 


K 
2. AF) = (Xia + Xui EE I pal (xy + %i)/(1 — Say wy]. 


yl 


Step 2 is repeated for vy = 0, 1, 2, ... until the \-parameter estimates converge to the desired 
degree of accuracy. If xy; + Xiy4 > LA (xy + x;;) for some i, then as for Model A an 
alternate formula must be used in place of step 2 above. In such cases, step 2 is replaced by 


Kea Thy + Xu) | 


(2 LI (Og, 2‘ Xji) /Q we i” ae yy] (1 = eed, 2, ae 


where h is chosen at each step of the iteration so that A{”” = Aj” for all = 1, 2, ... K. 


MLE’s for Parameters Under Model D-U 


1. pi? = xj/x.., MP = xy/n, and dj) = x.y/n. 


K K 
oa a nh, ue sinning | yi pens + Xmj Case Le ae] 


K K K 
Nees = yy [sup pion? | \ ani? | | +e [xii/( uae hw” — dy”) | 


[xu/(1 — MY? — Ag7?)]- 


~ 
ened 
—=—s 
iM» 


K 
1 
sy? =D [overage] Dy abe 


~ 
ll 
— 


Step 2 is repeated for vy = 0, 1, 2, ... until the \-parameter estimates converge to the desired 
degree of accuracy. 
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MLE’s for Parameters Under Model E-U 


ie Dy” = 1x7 (Xoo and Ao = (Xm + X-y)/2n. 


K K 
2. ph = nh, + xu] ag? | 3, pin | ae XM j [Asn | » pg |} 
h=1 


Ki i K 
ee { sul ain | yD py” | thay jasn > pn] 
j=1 


J h=1 h=1 


K = 
x { yg (Xj at x;i)/(1 = ht” > wy), " 


ies 


Step 2 is repeated fory = 0, 1, 2, ... until the \-parameter estimates converge to the desired 
degree of accuracy. 
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